AutoML¶

This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier

In [2]:

            
                Copied!
                
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
Parallel processing with 6 cores.

Dataset stats ==================== >>
Shape: (569, 31)
Memory: 138.96 kB
Scaled: False
Outlier values: 169 (1.2%)
-------------------------------------
Train set size: 456
Test set size: 113
-------------------------------------
|   |     dataset |       train |        test |
| - | ----------- | ----------- | ----------- |
| 0 |   212 (1.0) |   170 (1.0) |    42 (1.0) |
| 1 |   357 (1.7) |   286 (1.7) |    71 (1.7) |

In [4]:

            
                Copied!
                
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())

Fitting StandardScaler...
Applying StandardScaler to the dataset...

In [5]:

            
                Copied!
                
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled

Out[5]:

True

In [6]:

            
                Copied!
                
                    
                    
                
                

        
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=5,
    template="Transformer-Transformer-Classifier",
)
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=5,
    template="Transformer-Transformer-Classifier",
)

Fitting automl algorithm...

Optimization Progress:   0%|          | 0/100 [00:00<?, ?pipeline/s]

Generation 1 - Current best internal CV score: 0.9736502627806976

Generation 2 - Current best internal CV score: 0.9736741519350215

Generation 3 - Current best internal CV score: 0.9758241758241759

Generation 4 - Current best internal CV score: 0.9758241758241759

5.07 minutes have elapsed. TPOT will close down.
TPOT closed during evaluation in one generation.
WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.


TPOT closed prematurely. Will use the current best pipeline.

Best pipeline: XGBClassifier(PolynomialFeatures(MinMaxScaler(input_matrix), degree=2, include_bias=False, interaction_only=False), learning_rate=0.1, max_depth=1, min_child_weight=3, n_estimators=100, n_jobs=1, subsample=0.9500000000000001, verbosity=0)

Merging automl results with atom...
Applying MinMaxScaler to the dataset...
Applying PolynomialFeatures to the dataset...
Adding model XGBoost (XGB) to the pipeline...

Analyze the results¶

In [7]:

            
                Copied!
                
# The tpot estimator can be accessed for further analysis
atom.tpot
# The tpot estimator can be accessed for further analysis
atom.tpot

Out[7]:

TPOTClassifier(max_time_mins=5, n_jobs=6, random_state=1, scoring='accuracy',
               template='Transformer-Transformer-Classifier', verbosity=2)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [8]:

            
                Copied!
                
# Check the new transformers in the branch
atom.branch.status()
# Check the new transformers in the branch
atom.branch.status()

Branch: master
 --> Pipeline: 
   >>> StandardScaler
     --> copy: True
     --> with_mean: True
     --> with_std: True
   >>> MinMaxScaler
     --> feature_range: (0, 1)
     --> copy: True
     --> clip: False
   >>> PolynomialFeatures
     --> degree: 2
     --> interaction_only: False
     --> include_bias: False
     --> order: C
 --> Models: XGB

In [9]:

            
                Copied!
                
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()

In [10]:

            
                Copied!
                
# Note that the model is also merged with atom
atom.xgb
# Note that the model is also merged with atom
atom.xgb

Out[10]:

XGBoost
 --> Estimator: XGBClassifier
 --> Evaluation: accuracy: 0.9646

In [11]:

            
                Copied!
                
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="xgb")
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="xgb")

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('minmaxscaler', MinMaxScaler()),
                ('polynomialfeatures', PolynomialFeatures(include_bias=False)),
                ('XGB',
                 XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
                               colsample_bylevel=1, colsample_bynode=1,
                               colsample_bytree=1, early_stopping_rounds=None,
                               enable_categorical=False, eval_metric=None,
                               gamma=0, gpu_id=-1, grow_policy='depthwise',
                               importance_type=None, interaction_constraints='',
                               learning_rate=0.1, max_bin=256,
                               max_cat_to_onehot=4, max_delta_step=0,
                               max_depth=1, max_leaves=0, min_child_weight=3,
                               missing=nan, monotone_constraints='()',
                               n_estimators=100, n_jobs=1, num_parallel_tree=1,
                               predictor='auto', random_state=1, reg_alpha=0,
                               reg_lambda=1, ...))])