Example: Accelerating pipelines on GPU¶

This example shows how to accelerate a pipeline on GPU using cuML.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

In [1]:

            
                Copied!
                
from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancer

In [2]:

            
                Copied!
                
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)

In [3]:

            
                Copied!
                
atom = ATOMClassifier(X, y, device="gpu", engine="cuml", verbose=2)
atom = ATOMClassifier(X, y, device="gpu", engine="cuml", verbose=2)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
GPU training enabled.
Backend engine: cuml.

Dataset stats ==================== >>
Shape: (569, 31)
Memory: 141.24 kB
Scaled: False
Outlier values: 171 (1.2%)
-------------------------------------
Train set size: 456
Test set size: 113
-------------------------------------
|   |     dataset |       train |        test |
| - | ----------- | ----------- | ----------- |
| 0 |   212 (1.0) |   170 (1.0) |    42 (1.0) |
| 1 |   357 (1.7) |   286 (1.7) |    71 (1.7) |

In [4]:

            
                Copied!
                
atom.clean()
atom.clean()

Fitting Cleaner...
Cleaning the data...
 --> Label-encoding the target column.

In [5]:

            
                Copied!
                
atom.run(["lr", "rf"], n_trials=10)
atom.run(["lr", "rf"], n_trials=10)

Training ========================= >>
Models: LR, RF
Metric: f1


Running hyperparameter tuning for LogisticRegression...
| trial |       C | max_iter | l1_ratio |      f1 | best_f1 | time_trial | time_ht |    state |
| ----- | ------- | -------- | -------- | ------- | ------- | ---------- | ------- | -------- |
| 0     | 76.5012 |      970 |      --- |  0.9821 |  0.9821 |     0.932s |  0.932s | COMPLETE |
| 1     | 42.1177 |      740 |      --- |  0.9739 |  0.9821 |     0.093s |  1.025s | COMPLETE |
| 2     |  0.0059 |      690 |      --- |  0.9661 |  0.9821 |     0.054s |  1.080s | COMPLETE |
| 3     |  3.3535 |      680 |      --- |  0.9913 |  0.9913 |     0.069s |  1.149s | COMPLETE |
| 4     |  5.1898 |      770 |      --- |  0.9739 |  0.9913 |     0.078s |  1.227s | COMPLETE |
| 5     |  0.0902 |      840 |      --- |  0.9739 |  0.9913 |     0.055s |  1.282s | COMPLETE |
| 6     |  0.0225 |      810 |      --- |   0.958 |  0.9913 |     0.054s |  1.336s | COMPLETE |
| 7     |  0.0557 |      150 |      --- |  0.9661 |  0.9913 |     0.058s |  1.395s | COMPLETE |
| 8     | 10.9765 |      920 |      --- |  0.9735 |  0.9913 |     0.072s |  1.467s | COMPLETE |
| 9     |  1.8506 |      170 |      --- |  0.9825 |  0.9913 |     0.065s |  1.532s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 3
Best parameters:
 --> C: 3.3535
 --> max_iter: 680
Best evaluation --> f1: 0.9913
Time elapsed: 1.532s
Fit ---------------------------------------------
Train evaluation --> f1: 0.9913
Test evaluation --> f1: 0.9793
Time elapsed: 0.060s
-------------------------------------------------
Total time: 1.592s


Running hyperparameter tuning for RandomForest...
| trial | n_estimators | split_criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | max_samples |      f1 | best_f1 | time_trial | time_ht |    state |
| ----- | ------------ | --------------- | --------- | ----------------- | ---------------- | ------------ | --------- | ----------- | ------- | ------- | ---------- | ------- | -------- |
| 0     |          380 |            gini |         3 |                14 |                5 |          0.8 |     False |         --- |  0.9455 |  0.9455 |     0.845s |  0.845s | COMPLETE |
| 1     |          170 |            gini |         9 |                 6 |               13 |          0.9 |     False |         --- |  0.9558 |  0.9558 |     0.424s |  1.269s | COMPLETE |
| 2     |          270 |         entropy |         4 |                10 |               12 |          0.8 |     False |         --- |  0.9739 |  0.9739 |     0.607s |  1.876s | COMPLETE |
| 3     |           20 |         entropy |         7 |                 9 |               17 |          0.9 |     False |         --- |  0.9636 |  0.9739 |     0.147s |  2.023s | COMPLETE |
| 4     |          350 |         entropy |        13 |                 8 |               16 |          0.9 |     False |         --- |  0.9009 |  0.9739 |     0.801s |  2.824s | COMPLETE |
| 5     |          430 |            gini |        13 |                 4 |               10 |          0.8 |     False |         --- |  0.9402 |  0.9739 |     0.837s |  3.662s | COMPLETE |
| 6     |          250 |            gini |         9 |                 9 |               14 |         log2 |     False |         --- |   0.958 |  0.9739 |     0.562s |  4.224s | COMPLETE |
| 7     |          410 |         entropy |         3 |                 8 |               11 |          0.7 |      True |         0.6 |  0.9333 |  0.9739 |     0.831s |  5.055s | COMPLETE |
| 8     |          100 |            gini |         3 |                10 |               18 |          0.8 |     False |         --- |  0.9381 |  0.9739 |     0.279s |  5.334s | COMPLETE |
| 9     |          330 |         entropy |         7 |                 6 |               17 |          0.7 |     False |         --- |  0.9558 |  0.9739 |     0.731s |  6.065s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 2
Best parameters:
 --> n_estimators: 270
 --> split_criterion: entropy
 --> max_depth: 4
 --> min_samples_split: 10
 --> min_samples_leaf: 12
 --> max_features: 0.8
 --> bootstrap: False
Best evaluation --> f1: 0.9739
Time elapsed: 6.065s
Fit ---------------------------------------------
Train evaluation --> f1: 0.9809
Test evaluation --> f1: 0.9655
Time elapsed: 0.573s
-------------------------------------------------
Total time: 6.638s


Final results ==================== >>
Total time: 8.389s
-------------------------------------
LogisticRegression --> f1: 0.9793 !
RandomForest       --> f1: 0.9655

In [6]:

            
                Copied!
                
atom.evaluate()
atom.evaluate()

Out[6]:

	accuracy	average_precision	balanced_accuracy	f1	jaccard	matthews_corrcoef	precision	recall	roc_auc
LR	0.9735	0.9817	0.9643	0.9793	0.9595	0.9439	0.9595	1.0000	0.9762
RF	0.9558	0.9919	0.9453	0.9655	0.9333	0.9054	0.9459	0.9859	0.9866

In [7]:

            
                Copied!
                
print(atom.lr.estimator.__module__)
print(atom.rf.estimator.__module__)
print(atom.lr.estimator.__module__)
print(atom.rf.estimator.__module__)

cuml.linear_model.logistic_regression
cuml.ensemble.randomforestclassifier