Example: Ensembles¶

This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

Copied!

# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

In [2]:

Copied!

# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)

Run the pipeline¶

In [3]:

Copied!

# Initialize atom and train several models
atom = ATOMClassifier(X, y, holdout_size=0.2, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, holdout_size=0.2, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (456, 31)
Train set size: 343
Test set size: 113
Holdout set size: 113
-------------------------------------
Memory: 111.40 kB
Scaled: False
Outlier values: 124 (1.2%)


Training ========================= >>
Models: LR, Tree, LGB
Metric: accuracy


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.9913
Test evaluation --> accuracy: 0.9823
Time elapsed: 0.422s
-------------------------------------------------
Time: 0.422s


Results for DecisionTree:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9558
Time elapsed: 0.028s
-------------------------------------------------
Time: 0.028s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9558
Time elapsed: 0.597s
-------------------------------------------------
Time: 0.597s


Final results ==================== >>
Total time: 1.533s
-------------------------------------
LogisticRegression --> accuracy: 0.9823 !
DecisionTree       --> accuracy: 0.9558
LightGBM           --> accuracy: 0.9558

Voting¶

In [4]:

Copied!

# Combine the models into a Voting model
atom.voting(voting="soft")
# Combine the models into a Voting model
atom.voting(voting="soft")

Results for Voting:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9735
Time elapsed: 0.228s

In [5]:

Copied!

# Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models

Out[5]:

['LR', 'Tree', 'LGB', 'Vote']

In [6]:

Copied!

# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()

No description has been provided for this image

In [7]:

Copied!

# We can use it like any other model to make predictions or plots
atom.vote.predict_proba("test")
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba("test")

Out[7]:

	0	1
343	0.000010	0.999990
344	0.999989	0.000011
345	0.999988	0.000012
346	0.000175	0.999825
347	0.001798	0.998202
...	...	...
451	0.997818	0.002182
452	0.010445	0.989555
453	0.999987	0.000013
454	0.000325	0.999675
455	0.999984	0.000016

113 rows × 2 columns

In [8]:

Copied!

atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])

In [9]:

Copied!

atom.plot_results(legend=None)
atom.plot_results(legend=None)

In [10]:

Copied!

atom.delete("vote")
atom.delete("vote")

Deleting 1 models...
 --> Model Vote successfully deleted.

Stacking¶

Just like Voting, we can create a Stacking mode. lUsing train_on_test=True trains the final estimator on the test set to avoid overfitting on the training set.

In [11]:

Copied!

atom.stacking(final_estimator="LDA", train_on_test=True)
atom.stacking(final_estimator="LDA", train_on_test=True)

Results for Stacking:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.9942
Test evaluation --> accuracy: 0.9735
Time elapsed: 0.148s

In [12]:

Copied!

# Now we use the holdout set to evaluate the performance of the Stack model
atom.stack.results
# Now we use the holdout set to evaluate the performance of the Stack model
atom.stack.results

Out[12]:

accuracy_train      0.994200
accuracy_test       0.973500
accuracy_holdout    0.982300
time_fit            0.148135
time                0.148135
Name: Stack, dtype: float64

In [13]:

Copied!

atom.plot_results(rows="holdout", legend=None)
atom.plot_results(rows="holdout", legend=None)

In [14]:

Copied!

atom.stack.plot_roc(rows="holdout")
atom.stack.plot_roc(rows="holdout")

In [15]:

Copied!

# Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)

Out[15]:

0      0
1      0
2      0
3      0
4      0
      ..
564    0
565    0
566    0
567    0
568    1
Name: target, Length: 569, dtype: int64

In [16]:

Copied!

atom.stack.plot_shap_beeswarm(show=10)
atom.stack.plot_shap_beeswarm(show=10)

PermutationExplainer explainer: 114it [01:20,  1.29it/s]