AutoML¶

This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [22]:

            
                Copied!
                
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier

In [23]:

            
                Copied!
                
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [24]:

            
                Copied!
                
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
Parallel processing with 6 cores.

Dataset stats ==================== >>
Shape: (569, 31)
Memory: 138.96 kB
Scaled: False
Outlier values: 169 (1.2%)
-------------------------------------
Train set size: 456
Test set size: 113
-------------------------------------
|   |     dataset |       train |        test |
| - | ----------- | ----------- | ----------- |
| 0 |   212 (1.0) |   170 (1.0) |    42 (1.0) |
| 1 |   357 (1.7) |   286 (1.7) |    71 (1.7) |

In [25]:

            
                Copied!
                
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())

Fitting StandardScaler...
Applying StandardScaler to the dataset...

In [26]:

            
                Copied!
                
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled

Out[26]:

True

In [27]:

            
                Copied!
                
                    
                    
                
                

        
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=5,
    template="Transformer-Transformer-Classifier",
)
# Find an optimized pipeline using AutoML
atom.automl(
    scoring="accuracy",
    max_time_mins=5,
    template="Transformer-Transformer-Classifier",
)

Fitting automl algorithm...

Optimization Progress:   0%|          | 0/100 [00:00<?, ?pipeline/s]

Generation 1 - Current best internal CV score: 0.9736502627806976

Generation 2 - Current best internal CV score: 0.9736741519350215

Generation 3 - Current best internal CV score: 0.9758480649784997

Generation 4 - Current best internal CV score: 0.9758719541328237

5.06 minutes have elapsed. TPOT will close down.
TPOT closed during evaluation in one generation.
WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.


TPOT closed prematurely. Will use the current best pipeline.

Best pipeline: XGBClassifier(PolynomialFeatures(MinMaxScaler(input_matrix), degree=2, include_bias=False, interaction_only=False), learning_rate=0.1, max_depth=1, min_child_weight=1, n_estimators=100, n_jobs=1, subsample=0.8500000000000001, verbosity=0)

Merging automl results with atom...
Applying MinMaxScaler to the dataset...
Applying PolynomialFeatures to the dataset...
Adding model XGBoost (XGB) to the pipeline...

Analyze the results¶

In [28]:

            
                Copied!
                
# The tpot estimator can be accessed for further analysis
atom.tpot
# The tpot estimator can be accessed for further analysis
atom.tpot

Out[28]:

TPOTClassifier(max_time_mins=5, n_jobs=6, random_state=1, scoring='accuracy',
               template='Transformer-Transformer-Classifier', verbosity=2)

In [29]:

            
                Copied!
                
# Check the new transformers in the branch
atom.branch.status()
# Check the new transformers in the branch
atom.branch.status()

Branch: master
 --> Pipeline: 
   >>> StandardScaler
     --> copy: True
     --> with_mean: True
     --> with_std: True
   >>> MinMaxScaler
     --> feature_range: (0, 1)
     --> copy: True
     --> clip: False
   >>> PolynomialFeatures
     --> degree: 2
     --> interaction_only: False
     --> include_bias: False
     --> order: C
 --> Models: XGB

In [30]:

            
                Copied!
                
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()

In [31]:

            
                Copied!
                
# Note that the model is also merged with atom
atom.xgb
# Note that the model is also merged with atom
atom.xgb

Out[31]:

XGBoost
 --> Estimator: XGBClassifier
 --> Evaluation: accuracy: 0.9646

In [32]:

            
                Copied!
                
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="xgb")
print(pl)
# The pipeline can be exported to a sklearn-like pipeline
pl = atom.export_pipeline(model="xgb")
print(pl)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('minmaxscaler', MinMaxScaler()),
                ('polynomialfeatures', PolynomialFeatures(include_bias=False)),
                ('XGB',
                 XGBClassifier(base_score=0.5, booster='gbtree',
                               colsample_bylevel=1, colsample_bynode=1,
                               colsample_bytree=1, enable_categorical=False,
                               gamma=0, gpu_id=-1, importance_type=None,
                               interaction_constraints='', learning_rate=0.1,
                               max_delta_step=0, max_depth=1,
                               min_child_weight=1, missing=nan,
                               monotone_constraints='()', n_estimators=100,
                               n_jobs=1, num_parallel_tree=1, predictor='auto',
                               random_state=1, reg_alpha=0, reg_lambda=1,
                               scale_pos_weight=1, subsample=0.8500000000000001,
                               tree_method='exact', validate_parameters=1,
                               verbosity=0))])