Example: Accelerating pipelines on GPU¶
This example shows how to accelerate a pipeline on GPU using cuML.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
In [1]:
Copied!
from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancer
In [2]:
Copied!
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
In [3]:
Copied!
atom = ATOMClassifier(X, y, device="gpu", engine="cuml", verbose=2)
atom = ATOMClassifier(X, y, device="gpu", engine="cuml", verbose=2)
<< ================== ATOM ================== >> Algorithm task: binary classification. GPU training enabled. Backend engine: cuml. Dataset stats ==================== >> Shape: (569, 31) Memory: 141.24 kB Scaled: False Outlier values: 171 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) |
In [4]:
Copied!
atom.clean()
atom.clean()
Fitting Cleaner... Cleaning the data... --> Label-encoding the target column.
In [5]:
Copied!
atom.run(["lr", "rf"], n_trials=10)
atom.run(["lr", "rf"], n_trials=10)
Training ========================= >> Models: LR, RF Metric: f1 Running hyperparameter tuning for LogisticRegression... | trial | C | max_iter | l1_ratio | f1 | best_f1 | time_trial | time_ht | state | | ----- | ------- | -------- | -------- | ------- | ------- | ---------- | ------- | -------- | | 0 | 76.5012 | 970 | --- | 0.9821 | 0.9821 | 0.932s | 0.932s | COMPLETE | | 1 | 42.1177 | 740 | --- | 0.9739 | 0.9821 | 0.093s | 1.025s | COMPLETE | | 2 | 0.0059 | 690 | --- | 0.9661 | 0.9821 | 0.054s | 1.080s | COMPLETE | | 3 | 3.3535 | 680 | --- | 0.9913 | 0.9913 | 0.069s | 1.149s | COMPLETE | | 4 | 5.1898 | 770 | --- | 0.9739 | 0.9913 | 0.078s | 1.227s | COMPLETE | | 5 | 0.0902 | 840 | --- | 0.9739 | 0.9913 | 0.055s | 1.282s | COMPLETE | | 6 | 0.0225 | 810 | --- | 0.958 | 0.9913 | 0.054s | 1.336s | COMPLETE | | 7 | 0.0557 | 150 | --- | 0.9661 | 0.9913 | 0.058s | 1.395s | COMPLETE | | 8 | 10.9765 | 920 | --- | 0.9735 | 0.9913 | 0.072s | 1.467s | COMPLETE | | 9 | 1.8506 | 170 | --- | 0.9825 | 0.9913 | 0.065s | 1.532s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 3 Best parameters: --> C: 3.3535 --> max_iter: 680 Best evaluation --> f1: 0.9913 Time elapsed: 1.532s Fit --------------------------------------------- Train evaluation --> f1: 0.9913 Test evaluation --> f1: 0.9793 Time elapsed: 0.060s ------------------------------------------------- Total time: 1.592s Running hyperparameter tuning for RandomForest... | trial | n_estimators | split_criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | max_samples | f1 | best_f1 | time_trial | time_ht | state | | ----- | ------------ | --------------- | --------- | ----------------- | ---------------- | ------------ | --------- | ----------- | ------- | ------- | ---------- | ------- | -------- | | 0 | 380 | gini | 3 | 14 | 5 | 0.8 | False | --- | 0.9455 | 0.9455 | 0.845s | 0.845s | COMPLETE | | 1 | 170 | gini | 9 | 6 | 13 | 0.9 | False | --- | 0.9558 | 0.9558 | 0.424s | 1.269s | COMPLETE | | 2 | 270 | entropy | 4 | 10 | 12 | 0.8 | False | --- | 0.9739 | 0.9739 | 0.607s | 1.876s | COMPLETE | | 3 | 20 | entropy | 7 | 9 | 17 | 0.9 | False | --- | 0.9636 | 0.9739 | 0.147s | 2.023s | COMPLETE | | 4 | 350 | entropy | 13 | 8 | 16 | 0.9 | False | --- | 0.9009 | 0.9739 | 0.801s | 2.824s | COMPLETE | | 5 | 430 | gini | 13 | 4 | 10 | 0.8 | False | --- | 0.9402 | 0.9739 | 0.837s | 3.662s | COMPLETE | | 6 | 250 | gini | 9 | 9 | 14 | log2 | False | --- | 0.958 | 0.9739 | 0.562s | 4.224s | COMPLETE | | 7 | 410 | entropy | 3 | 8 | 11 | 0.7 | True | 0.6 | 0.9333 | 0.9739 | 0.831s | 5.055s | COMPLETE | | 8 | 100 | gini | 3 | 10 | 18 | 0.8 | False | --- | 0.9381 | 0.9739 | 0.279s | 5.334s | COMPLETE | | 9 | 330 | entropy | 7 | 6 | 17 | 0.7 | False | --- | 0.9558 | 0.9739 | 0.731s | 6.065s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 2 Best parameters: --> n_estimators: 270 --> split_criterion: entropy --> max_depth: 4 --> min_samples_split: 10 --> min_samples_leaf: 12 --> max_features: 0.8 --> bootstrap: False Best evaluation --> f1: 0.9739 Time elapsed: 6.065s Fit --------------------------------------------- Train evaluation --> f1: 0.9809 Test evaluation --> f1: 0.9655 Time elapsed: 0.573s ------------------------------------------------- Total time: 6.638s Final results ==================== >> Total time: 8.389s ------------------------------------- LogisticRegression --> f1: 0.9793 ! RandomForest --> f1: 0.9655
In [6]:
Copied!
atom.evaluate()
atom.evaluate()
Out[6]:
accuracy | average_precision | balanced_accuracy | f1 | jaccard | matthews_corrcoef | precision | recall | roc_auc | |
---|---|---|---|---|---|---|---|---|---|
LR | 0.9735 | 0.9817 | 0.9643 | 0.9793 | 0.9595 | 0.9439 | 0.9595 | 1.0000 | 0.9762 |
RF | 0.9558 | 0.9919 | 0.9453 | 0.9655 | 0.9333 | 0.9054 | 0.9459 | 0.9859 | 0.9866 |
In [7]:
Copied!
print(atom.lr.estimator.__module__)
print(atom.rf.estimator.__module__)
print(atom.lr.estimator.__module__)
print(atom.rf.estimator.__module__)
cuml.linear_model.logistic_regression cuml.ensemble.randomforestclassifier