AutoML¶

This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier

In [2]:

            
                Copied!
                
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
Parallel processing with 6 cores.

Dataset stats ====================== >>
Shape: (569, 31)
Scaled: False
Outlier values: 174 (1.2%)
---------------------------------------
Train set size: 456
Test set size: 113
---------------------------------------
|    | dataset   | train     | test     |
|---:|:----------|:----------|:---------|
|  0 | 212 (1.0) | 167 (1.0) | 45 (1.0) |
|  1 | 357 (1.7) | 289 (1.7) | 68 (1.5) |

In [4]:

            
                Copied!
                
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())

Fitting StandardScaler...
Applying StandardScaler to the dataset...

In [5]:

            
                Copied!
                
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled

Out[5]:

True

In [6]:

            
                Copied!
                
                    
                    
                
                

        
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=10,
    template="Transformer-Transformer-Classifier",
)
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=10,
    template="Transformer-Transformer-Classifier",
)

Fitting automl algorithm...

HBox(children=(HTML(value='Optimization Progress'), FloatProgress(value=0.0), HTML(value='')))

Generation 1 - Current best internal CV score: 0.9780936454849499

Generation 2 - Current best internal CV score: 0.9780936454849499

Generation 3 - Current best internal CV score: 0.9802675585284281

Generation 4 - Current best internal CV score: 0.9802675585284281

Generation 5 - Current best internal CV score: 0.9802914476827521

Generation 6 - Current best internal CV score: 0.9824653607262303

Generation 7 - Current best internal CV score: 0.9824653607262303

Generation 8 - Current best internal CV score: 0.9846870520783565

Generation 9 - Current best internal CV score: 0.9846870520783565

Generation 10 - Current best internal CV score: 0.9846870520783565

Generation 11 - Current best internal CV score: 0.9846870520783565

Generation 12 - Current best internal CV score: 0.9846870520783565

Generation 13 - Current best internal CV score: 0.9846870520783565

Generation 14 - Current best internal CV score: 0.9846870520783565

Generation 15 - Current best internal CV score: 0.9846870520783565

Generation 16 - Current best internal CV score: 0.9846870520783565

Generation 17 - Current best internal CV score: 0.9846870520783565

Generation 18 - Current best internal CV score: 0.9846870520783565

Generation 19 - Current best internal CV score: 0.9846870520783565

Generation 20 - Current best internal CV score: 0.9846870520783565

Generation 21 - Current best internal CV score: 0.9846870520783565

Generation 22 - Current best internal CV score: 0.9846870520783565

Generation 23 - Current best internal CV score: 0.9846870520783565

Generation 24 - Current best internal CV score: 0.9846870520783565

Generation 25 - Current best internal CV score: 0.9846870520783565

Generation 26 - Current best internal CV score: 0.9846870520783565

Generation 27 - Current best internal CV score: 0.9846870520783565

10.02 minutes have elapsed. TPOT will close down.
TPOT closed during evaluation in one generation.
WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.


TPOT closed prematurely. Will use the current best pipeline.

Best pipeline: MLPClassifier(Normalizer(MaxAbsScaler(input_matrix), norm=l2), alpha=0.001, learning_rate_init=0.01)

Merging automl results with atom...
Applying MaxAbsScaler to the dataset...
Applying Normalizer to the dataset...
Adding model Multi-layer Perceptron (MLP) to the pipeline...

Analyze the results¶

In [7]:

            
                Copied!
                
# The tpot estimator can be accessed for further analysis
atom.tpot
# The tpot estimator can be accessed for further analysis
atom.tpot

Out[7]:

TPOTClassifier(max_time_mins=10, n_jobs=6, random_state=1, scoring='accuracy',
               template='Transformer-Transformer-Classifier', verbosity=2)

In [8]:

            
                Copied!
                
# Check the new transformers in the branch
atom.branch.status()
# Check the new transformers in the branch
atom.branch.status()

Branch: master
 --> Pipeline: 
   >>> StandardScaler
     --> copy: True
     --> with_mean: True
     --> with_std: True
   >>> MaxAbsScaler
     --> copy: True
   >>> Normalizer
     --> norm: l2
     --> copy: True
 --> Models: MLP

In [9]:

            
                Copied!
                
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()

In [10]:

            
                Copied!
                
# Note that the model is also merged with atom
atom.mlp
# Note that the model is also merged with atom
atom.mlp

Out[10]:

Multi-layer Perceptron
 --> Estimator: MLPClassifier
 --> Evaluation: accuracy: 0.9735

In [11]:

            
                Copied!
                
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="mlp")
print(pl)
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="mlp")
print(pl)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('maxabsscaler', MaxAbsScaler()), ('normalizer', Normalizer()),
                ('MLP',
                 MLPClassifier(alpha=0.001, learning_rate_init=0.01,
                               random_state=1))])