Example: AutoML¶
This example shows how to use atom's AutoML implementation to automatically search for an optimized pipeline.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, n_jobs=6, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: binary classification. Parallel processing with 6 cores. Dataset stats ==================== >> Shape: (569, 31) Memory: 138.96 kB Scaled: False Outlier values: 167 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) |
In [4]:
Copied!
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
# It's possible to add custom estimators to the pipeline
atom.add(StandardScaler())
Adding StandardScaler to the pipeline... Fitting StandardScaler...
In [5]:
Copied!
# Check that the scaling worked
atom.scaled
# Check that the scaling worked
atom.scaled
Out[5]:
True
In [6]:
Copied!
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)
# Find an optimized pipeline using AutoML
atom.automl(objective="precision", max_time=2 * 60)
Searching for optimal pipeline... AutoMLSearch will use the holdout set to score and rank pipelines. Generating pipelines to search over... 8 pipelines ready for search. ***************************** * Beginning pipeline search * ***************************** Optimizing for Precision. Greater score is better. Using SequentialEngine to train and score pipelines. Will stop searching for new pipelines after 120 seconds. Allowed model families: linear_model, linear_model, xgboost, lightgbm, catboost, random_forest, decision_tree, extra_trees
FigureWidget({ 'data': [{'mode': 'lines+markers', 'name': 'Best Score', 'type'…
Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline Mode Baseline Binary Classification Pipeline: Starting cross validation Finished cross validation - mean Precision: 0.000 Starting holdout set scoring Finished holdout set scoring - Precision: 0.000 ***************************** * Evaluating Batch Number 1 * ***************************** Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.992 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.975 Starting holdout set scoring Finished holdout set scoring - Precision: 0.975 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.884 Starting holdout set scoring Finished holdout set scoring - Precision: 0.943 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 2 * ***************************** Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 3 * ***************************** Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Logistic Regression Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 0.994 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 4 * ***************************** XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.988 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.993 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.966 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.992 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 XGBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 5 * ***************************** CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.988 Starting holdout set scoring Finished holdout set scoring - Precision: 0.970 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.982 Starting holdout set scoring Finished holdout set scoring - Precision: 0.930 CatBoost Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.969 Starting holdout set scoring Finished holdout set scoring - Precision: 0.971 ***************************** * Evaluating Batch Number 6 * ***************************** Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Random Forest Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 7 * ***************************** Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.992 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Extra Trees Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 8 * ***************************** LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.962 Starting holdout set scoring Finished holdout set scoring - Precision: 0.974 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.975 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.922 Starting holdout set scoring Finished holdout set scoring - Precision: 0.919 LightGBM Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.987 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 ***************************** * Evaluating Batch Number 9 * ***************************** Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.894 Starting holdout set scoring Finished holdout set scoring - Precision: 0.943 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.880 Starting holdout set scoring Finished holdout set scoring - Precision: 0.976 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.911 Starting holdout set scoring Finished holdout set scoring - Precision: 0.895 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.912 Starting holdout set scoring Finished holdout set scoring - Precision: 0.925 Decision Tree Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer: Starting cross validation Finished cross validation - mean Precision: 0.884 Starting holdout set scoring Finished holdout set scoring - Precision: 0.943 ****************************** * Evaluating Batch Number 10 * ****************************** Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler: Starting cross validation Finished cross validation - mean Precision: 1.000 Starting holdout set scoring Finished holdout set scoring - Precision: 1.000 Search finished after 02:01 Best pipeline: Elastic Net Classifier w/ Label Encoder + Replace Nullable Types Transformer + Imputer + Standard Scaler Best pipeline Precision: 1.000000 Merging automl results with atom... --> Adding LabelEncoder to the pipeline... --> Adding ReplaceNullableTypes to the pipeline... --> Adding Imputer to the pipeline... --> Adding StandardScaler to the pipeline... --> Adding model LogisticRegression (LR) to the pipeline...
Analyze the results¶
In [7]:
Copied!
# The evalml estimator can be accessed for further analysis
atom.evalml
# The evalml estimator can be accessed for further analysis
atom.evalml
Out[7]:
<evalml.automl.automl_search.AutoMLSearch at 0x23c82ac8b20>
In [8]:
Copied!
# Check the new transformers in the branch
atom.branch.status()
# Check the new transformers in the branch
atom.branch.status()
Branch: master --> Pipeline: --> StandardScaler --> LabelEncoder --> ReplaceNullableTypes --> Imputer --> StandardScaler --> Models: LR
In [9]:
Copied!
# Or draw the pipeline
atom.plot_pipeline()
# Or draw the pipeline
atom.plot_pipeline()
In [10]:
Copied!
# Note that the model is also merged with atom
atom.lr
# Note that the model is also merged with atom
atom.lr
Out[10]:
LogisticRegression --> Estimator: LogisticRegression --> Evaluation: precision: 0.9726
In [11]:
Copied!
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")
# The pipeline can be exported to a sklearn-like pipeline
atom.export_pipeline(model="lr")
Out[11]:
Pipeline(memory=Memory(location=None), steps=[('standardscaler', StandardScaler()), ('labelencoder', LabelEncoder(positive_label=None)), ('replacenullabletypes', ReplaceNullableTypes()), ('imputer', Imputer(categorical_impute_strategy='most_frequent', numeric_impute_strategy='most_frequent', boolean_impute_strategy='most_frequent', categorical_fill_value=None, numeric_fill_value=None, boolean_fill_value=None)), ('standardscaler2', StandardScaler()), ('LR', LogisticRegression(C=1.9835334708679646, l1_ratio=0.2896296405458125, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(memory=Memory(location=None), steps=[('standardscaler', StandardScaler()), ('labelencoder', LabelEncoder(positive_label=None)), ('replacenullabletypes', ReplaceNullableTypes()), ('imputer', Imputer(categorical_impute_strategy='most_frequent', numeric_impute_strategy='most_frequent', boolean_impute_strategy='most_frequent', categorical_fill_value=None, numeric_fill_value=None, boolean_fill_value=None)), ('standardscaler2', StandardScaler()), ('LR', LogisticRegression(C=1.9835334708679646, l1_ratio=0.2896296405458125, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga'))])
StandardScaler()
Label Encoder
Replace Nullable Types Transformer
Imputer
Standard Scaler
LogisticRegression(C=1.9835334708679646, l1_ratio=0.2896296405458125, n_jobs=6, penalty='elasticnet', random_state=1, solver='saga')