Example: AutoML¶

This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier

In [2]:

                
                    Copied!
                    
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

                
                    Copied!
                    
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
Parallel processing with 6 cores.
Parallelization backend: loky

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 141.24 kB
Scaled: False
Outlier values: 167 (1.2%)

In [4]:

                
                    Copied!
                    
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())

Adding StandardScaler to the pipeline...
Fitting StandardScaler...

In [5]:

                
                    Copied!
                    
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled

Out[5]:

True

In [6]:

                
                    Copied!
                    
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)

Searching for optimal pipeline...
AutoMLSearch will use the holdout set to score and rank pipelines.
Generating pipelines to search over...
8 pipelines ready for search.

*****************************
* Beginning pipeline search *
*****************************

Optimizing for Precision. 
Greater score is better.

Using SequentialEngine to train and score pipelines.
Will stop searching for new pipelines after 120 seconds.

Allowed model families: linear_model, linear_model, xgboost, lightgbm, catboost, random_forest, decision_tree, extra_trees

FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': 'Best Score',
              'type': 'scatter',
              'uid': 'c7a518c7-6ed2-42c2-804d-7a524a78992b',
              'x': [],
              'y': []},
             {'marker': {'color': 'gray'},
              'mode': 'markers',
              'name': 'Iter score',
              'type': 'scatter',
              'uid': '92997c54-bd74-46ff-b363-4307cfd7c393',
              'x': [],
              'y': []}],
    'layout': {'showlegend': False,
               'template': '...',
               'title': {'text': ('Pipeline Search: Iteration vs.' ... 'ore at current iteration</sub>')},
               'xaxis': {'rangemode': 'tozero', 'title': {'text': 'Iteration'}},
               'yaxis': {'title': {'text': 'Validation Score'}}}
})

Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
	Starting cross validation
	Finished cross validation - mean Precision: 0.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 0.000

*****************************
* Evaluating Batch Number 1 *
*****************************

Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 0.992
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 0.975
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 0.975
CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 0.994
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 0.884
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 0.943
Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000

*****************************
* Evaluating Batch Number 2 *
*****************************

Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000

*****************************
* Evaluating Batch Number 3 *
*****************************

Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 1.000
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000
Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler:
	Starting cross validation
	Finished cross validation - mean Precision: 0.994
	Starting holdout set scoring
	Finished holdout set scoring - Precision: 1.000

Search finished after 02:02            
Best pipeline: Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler
Best pipeline Precision: 1.000000

Merging automl results with atom...
 --> Adding LabelEncoder to the pipeline...
 --> Adding ReplaceNullableTypes to the pipeline...
 --> Adding Imputer to the pipeline...
 --> Adding StandardScaler to the pipeline...
 --> Adding model LogisticRegression (LR) to the pipeline...

Analyze the results¶

In [7]:

                
                    Copied!
                    
# The evalml estimator can be accessed for further analysis
atom.evalml
# The evalml estimator can be accessed for further analysis
atom.evalml

Out[7]:

<evalml.automl.automl_search.AutoMLSearch at 0x1a1686b9510>

In [10]:

                
                    Copied!
                    
# Check the new transformers in the branch
atom.pipeline
# Check the new transformers in the branch
atom.pipeline

Out[10]:

0                      StandardScaler()
1                         Label Encoder
2    Replace Nullable Types Transformer
3                               Imputer
4                       Standard Scaler
dtype: object

In [11]:

                
                    Copied!
                    
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()

In [14]:

                
                    Copied!
                    
# Note that the model is also merged with atom
atom.models
# Note that the model is also merged with atom
atom.models

Out[14]:

'LR'

In [13]:

                
                    Copied!
                    
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")

Out[13]:

Pipeline(memory=Memory(location=None),
         steps=[('standardscaler', StandardScaler()),
                ('labelencoder', LabelEncoder(positive_label=None)),
                ('replacenullabletypes', ReplaceNullableTypes()),
                ('imputer',
                 Imputer(categorical_impute_strategy='most_frequent', numeric_impute_strategy='mean', boolean_impute_strategy='most_frequent', categorical_fill_value=None, numeric_fill_value=None, boolean_fill_value=None)),
                ('standardscaler2', StandardScaler()),
                ('LR',
                 LogisticRegression(l1_ratio=0.15, n_jobs=6,
                                    penalty='elasticnet', random_state=1,
                                    solver='saga'))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.