AutoML¶
This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: binary classification. Parallel processing with 6 cores. Dataset stats ====================== >> Shape: (569, 31) Scaled: False Outlier values: 174 (1.2%) --------------------------------------- Train set size: 456 Test set size: 113 --------------------------------------- | | dataset | train | test | |---:|:----------|:----------|:---------| | 0 | 212 (1.0) | 167 (1.0) | 45 (1.0) | | 1 | 357 (1.7) | 289 (1.7) | 68 (1.5) |
In [4]:
Copied!
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
Fitting StandardScaler... Applying StandardScaler to the dataset...
In [5]:
Copied!
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled
Out[5]:
True
In [6]:
Copied!
# Find an optimized pipeline using AutoML
atom.automl(
scoring="accuracy",
max_time_mins=10,
template="Transformer-Transformer-Classifier",
)
# Find an optimized pipeline using AutoML
atom.automl(
scoring="accuracy",
max_time_mins=10,
template="Transformer-Transformer-Classifier",
)
Fitting automl algorithm...
HBox(children=(HTML(value='Optimization Progress'), FloatProgress(value=0.0), HTML(value='')))
Generation 1 - Current best internal CV score: 0.9780936454849499 Generation 2 - Current best internal CV score: 0.9780936454849499 Generation 3 - Current best internal CV score: 0.9802675585284281 Generation 4 - Current best internal CV score: 0.9802675585284281 Generation 5 - Current best internal CV score: 0.9802914476827521 Generation 6 - Current best internal CV score: 0.9824653607262303 Generation 7 - Current best internal CV score: 0.9824653607262303 Generation 8 - Current best internal CV score: 0.9846870520783565 Generation 9 - Current best internal CV score: 0.9846870520783565 Generation 10 - Current best internal CV score: 0.9846870520783565 Generation 11 - Current best internal CV score: 0.9846870520783565 Generation 12 - Current best internal CV score: 0.9846870520783565 Generation 13 - Current best internal CV score: 0.9846870520783565 Generation 14 - Current best internal CV score: 0.9846870520783565 Generation 15 - Current best internal CV score: 0.9846870520783565 Generation 16 - Current best internal CV score: 0.9846870520783565 Generation 17 - Current best internal CV score: 0.9846870520783565 Generation 18 - Current best internal CV score: 0.9846870520783565 Generation 19 - Current best internal CV score: 0.9846870520783565 Generation 20 - Current best internal CV score: 0.9846870520783565 Generation 21 - Current best internal CV score: 0.9846870520783565 Generation 22 - Current best internal CV score: 0.9846870520783565 Generation 23 - Current best internal CV score: 0.9846870520783565 Generation 24 - Current best internal CV score: 0.9846870520783565 Generation 25 - Current best internal CV score: 0.9846870520783565 Generation 26 - Current best internal CV score: 0.9846870520783565 Generation 27 - Current best internal CV score: 0.9846870520783565 10.02 minutes have elapsed. TPOT will close down. TPOT closed during evaluation in one generation. WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation. TPOT closed prematurely. Will use the current best pipeline. Best pipeline: MLPClassifier(Normalizer(MaxAbsScaler(input_matrix), norm=l2), alpha=0.001, learning_rate_init=0.01) Merging automl results with atom... Applying MaxAbsScaler to the dataset... Applying Normalizer to the dataset... Adding model Multi-layer Perceptron (MLP) to the pipeline...
Analyze the results¶
In [7]:
Copied!
# The tpot estimator can be accessed for further analysis
atom.tpot
# The tpot estimator can be accessed for further analysis
atom.tpot
Out[7]:
TPOTClassifier(max_time_mins=10, n_jobs=6, random_state=1, scoring='accuracy', template='Transformer-Transformer-Classifier', verbosity=2)
In [8]:
Copied!
# Check the new transformers in the branch
atom.branch.status()
# Check the new transformers in the branch
atom.branch.status()
Branch: master --> Pipeline: >>> StandardScaler --> copy: True --> with_mean: True --> with_std: True >>> MaxAbsScaler --> copy: True >>> Normalizer --> norm: l2 --> copy: True --> Models: MLP
In [9]:
Copied!
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()
In [10]:
Copied!
# Note that the model is also merged with atom
atom.mlp
# Note that the model is also merged with atom
atom.mlp
Out[10]:
Multi-layer Perceptron --> Estimator: MLPClassifier --> Evaluation: accuracy: 0.9735
In [11]:
Copied!
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="mlp")
print(pl)
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="mlp")
print(pl)
Pipeline(steps=[('standardscaler', StandardScaler()), ('maxabsscaler', MaxAbsScaler()), ('normalizer', Normalizer()), ('MLP', MLPClassifier(alpha=0.001, learning_rate_init=0.01, random_state=1))])