Example: AutoML¶
This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: binary classification. Parallel processing with 6 cores. Parallelization backend: loky Dataset stats ==================== >> Shape: (569, 31) Train set size: 456 Test set size: 113 ------------------------------------- Memory: 141.24 kB Scaled: False Outlier values: 167 (1.2%)
In [4]:
Copied!
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
Adding StandardScaler to the pipeline... Fitting StandardScaler...
In [5]:
Copied!
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled
Out[5]:
True
In [6]:
Copied!
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)
Searching for optimal pipeline... AutoMLSearch will use the holdout set to score and rank pipelines. Generating pipelines to search over... 8 pipelines ready for search. ***************************** * Beginning pipeline search * ***************************** Optimizing for Precision. Greater score is better. Using SequentialEngine to train and score pipelines. Will stop searching for new pipelines after 120 seconds. Allowed model families: linear_model, linear_model, xgboost, lightgbm, catboost, random_forest, decision_tree, extra_trees
FigureWidget({ 'data': [{'mode': 'lines+markers', 'name': 'Best Score', 'type': 'scatter', 'uid': 'c7a518c7-6ed2-42c2-804d-7a524a78992b', 'x': [], 'y': []}, {'marker': {'color': 'gray'}, 'mode': 'markers', 'name': 'Iter score', 'type': 'scatter', 'uid': '92997c54-bd74-46ff-b363-4307cfd7c393', 'x': [], 'y': []}], 'layout': {'showlegend': False, 'template': '...', 'title': {'text': ('Pipeline Search: Iteration vs.' ... 'ore at current iteration</sub>')}, 'xaxis': {'rangemode': 'tozero', 'title': {'text': 'Iteration'}}, 'yaxis': {'title': {'text': 'Validation Score'}}} })
Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline Mode Baseline Binary Classification Pipeline: Starting cross validation Finished cross validation - mean Precision: 0.000 Starting holdout set scoring Finished holdout set scoring - Precision: 0.000 ***************************** * Evaluating Batch Number 1 * ***************************** Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.992 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.975 Starting holdout set scoring Finished holdout set scoring - Precision: 0.975 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.884 Starting holdout set scoring Finished holdout set scoring - Precision: 0.943 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 2 * ***************************** Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 3 * ***************************** Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Search finished after 02:02 Best pipeline: Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler Best pipeline Precision: 1.000000 Merging automl results with atom... --> Adding LabelEncoder to the pipeline... --> Adding ReplaceNullableTypes to the pipeline... --> Adding Imputer to the pipeline... --> Adding StandardScaler to the pipeline... --> Adding model LogisticRegression (LR) to the pipeline...
Analyze the results¶
In [7]:
Copied!
# The evalml estimator can be accessed for further analysis
atom.evalml
# The evalml estimator can be accessed for further analysis
atom.evalml
Out[7]:
<evalml.automl.automl_search.AutoMLSearch at 0x1a1686b9510>
In [10]:
Copied!
# Check the new transformers in the branch
atom.pipeline
# Check the new transformers in the branch
atom.pipeline
Out[10]:
0 StandardScaler() 1 Label Encoder 2 Replace Nullable Types Transformer 3 Imputer 4 Standard Scaler dtype: object
In [11]:
Copied!
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()
In [14]:
Copied!
# Note that the model is also merged with atom
atom.models
# Note that the model is also merged with atom
atom.models
Out[14]:
'LR'
In [13]:
Copied!
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")
Out[13]:
Pipeline(memory=Memory(location=None), steps=[('standardscaler', StandardScaler()), ('labelencoder', LabelEncoder(positive_label=None)), ('replacenullabletypes', ReplaceNullableTypes()), ('imputer', Imputer(categorical_impute_strategy='most_frequent', numeric_impute_strategy='mean', boolean_impute_strategy='most_frequent', categorical_fill_value=None, numeric_fill_value=None, boolean_fill_value=None)), ('standardscaler2', StandardScaler()), ('LR', LogisticRegression(l1_ratio=0.15, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(memory=Memory(location=None), steps=[('standardscaler', StandardScaler()), ('labelencoder', LabelEncoder(positive_label=None)), ('replacenullabletypes', ReplaceNullableTypes()), ('imputer', Imputer(categorical_impute_strategy='most_frequent', numeric_impute_strategy='mean', boolean_impute_strategy='most_frequent', categorical_fill_value=None, numeric_fill_value=None, boolean_fill_value=None)), ('standardscaler2', StandardScaler()), ('LR', LogisticRegression(l1_ratio=0.15, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga'))])
StandardScaler()
Label Encoder
Replace Nullable Types Transformer
Imputer
Standard Scaler
LogisticRegression(l1_ratio=0.15, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga')