Example: Ensembles¶

This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

UserWarning: The pandas version installed (1.5.3) does not match the supported pandas version in Modin (1.5.2). This may cause undesired side effects!

In [2]:

                
                    Copied!
                    
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)

Run the pipeline¶

In [3]:

                
                    Copied!
                    
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.96 kB
Scaled: False
Outlier values: 167 (1.2%)


Training ========================= >>
Models: LR, Tree, LGB
Metric: accuracy


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.989
Test evaluation --> accuracy: 0.9823
Time elapsed: 0.037s
-------------------------------------------------
Total time: 0.037s


Results for DecisionTree:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9469
Time elapsed: 0.019s
-------------------------------------------------
Total time: 0.019s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9469
Time elapsed: 0.218s
-------------------------------------------------
Total time: 0.218s


Final results ==================== >>
Total time: 0.278s
-------------------------------------
LogisticRegression --> accuracy: 0.9823 !
DecisionTree       --> accuracy: 0.9469
LightGBM           --> accuracy: 0.9469

Voting¶

In [4]:

                
                    Copied!
                    
# Combine the models into a Voting model
atom.voting(voting="soft")
# Combine the models into a Voting model
atom.voting(voting="soft")

Results for Voting:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9469
Time elapsed: 0.026s

In [5]:

                
                    Copied!
                    
# Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models

Out[5]:

['LR', 'Tree', 'LGB', 'Vote']

In [6]:

                
                    Copied!
                    
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()

In [7]:

                
                    Copied!
                    
# The Vote model averages the scores of the models it contains
atom.vote
# The Vote model averages the scores of the models it contains
atom.vote

Out[7]:

Voting()

In [8]:

                
                    Copied!
                    
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]

Out[8]:

	0	1
456	0.060290	0.939710
457	0.999984	0.000016
458	0.000018	0.999982
459	0.000046	0.999954
460	0.999990	0.000010
461	0.028359	0.971641
462	0.000027	0.999973
463	0.000224	0.999776
464	0.999975	0.000025
465	0.000016	0.999984

In [9]:

                
                    Copied!
                    
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])

In [10]:

                
                    Copied!
                    
atom.plot_results(legend=None)
atom.plot_results(legend=None)

In [11]:

                
                    Copied!
                    
atom.delete("vote")
atom.delete("vote")

Deleting 1 models...
 --> Model Vote successfully deleted.

Stacking¶

In [12]:

                
                    Copied!
                    
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")

Results for Stacking:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.9934
Test evaluation --> accuracy: 0.9823
Time elapsed: 0.636s

In [13]:

                
                    Copied!
                    
# The final estimator uses the predictions of the underlying models
atom.stack.head()
# The final estimator uses the predictions of the underlying models
atom.stack.head()

Out[13]:

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	...	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension	target
0	13.48	20.82	88.40	559.2	0.10160	0.12550	0.10630	0.05439	0.1720	0.06419	...	26.02	107.30	740.4	0.1610	0.42250	0.5030	0.22580	0.2807	0.10710	0
1	18.31	20.58	120.80	1052.0	0.10680	0.12480	0.15690	0.09451	0.1860	0.05941	...	26.20	142.20	1493.0	0.1492	0.25360	0.3759	0.15100	0.3074	0.07863	0
2	17.93	24.48	115.20	998.9	0.08855	0.07027	0.05699	0.04744	0.1538	0.05510	...	34.69	135.10	1320.0	0.1315	0.18060	0.2080	0.11360	0.2504	0.07948	0
3	15.13	29.81	96.71	719.5	0.08320	0.04605	0.04686	0.02739	0.1852	0.05294	...	36.91	110.10	931.4	0.1148	0.09866	0.1547	0.06575	0.3233	0.06165	0
4	8.95	15.76	58.74	245.2	0.09462	0.12430	0.09263	0.02308	0.1305	0.07163	...	17.07	63.34	270.0	0.1179	0.18790	0.1544	0.03846	0.1652	0.07722	1

5 rows × 31 columns

In [14]:

                
                    Copied!
                    
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)

Out[14]:

0      0
1      0
2      0
3      0
4      0
      ..
564    0
565    0
566    0
567    0
568    1
Name: target, Length: 569, dtype: int64

In [15]:

                
                    Copied!
                    
atom.stack.plot_shap_beeswarm(show=10)
atom.stack.plot_shap_beeswarm(show=10)

Permutation explainer: 114it [00:42,  2.15it/s]