Ensembles¶

This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

In [2]:

            
                Copied!
                
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Memory: 138.96 kB
Scaled: False
Outlier values: 169 (1.2%)
-------------------------------------
Train set size: 456
Test set size: 113
-------------------------------------
|   |     dataset |       train |        test |
| - | ----------- | ----------- | ----------- |
| 0 |   212 (1.0) |   170 (1.0) |    42 (1.0) |
| 1 |   357 (1.7) |   286 (1.7) |    71 (1.7) |


Training ========================= >>
Models: LR, Tree, LGB
Metric: accuracy


Results for Logistic Regression:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.9868
Test evaluation --> accuracy: 0.9912
Time elapsed: 0.016s
-------------------------------------------------
Total time: 0.016s


Results for Decision Tree:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9646
Time elapsed: 0.016s
-------------------------------------------------
Total time: 0.016s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9823
Time elapsed: 0.234s
-------------------------------------------------
Total time: 0.234s


Final results ==================== >>
Duration: 0.266s
-------------------------------------
Logistic Regression --> accuracy: 0.9912 !
Decision Tree       --> accuracy: 0.9646
LightGBM            --> accuracy: 0.9823

Voting¶

In [4]:

            
                Copied!
                
# Combine the models into a Voting model
atom.voting(voting="soft")
# Combine the models into a Voting model
atom.voting(voting="soft")


Results for Voting:
Fit ---------------------------------------------
Train evaluation --> accuracy: 1.0
Test evaluation --> accuracy: 0.9735
Time elapsed: 0.016s

In [5]:

            
                Copied!
                
# Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models

Out[5]:

['LR', 'Tree', 'LGB', 'Vote']

In [6]:

            
                Copied!
                
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()

In [7]:

            
                Copied!
                
# The Vote model averages the scores of the models it contains
atom.vote
# The Vote model averages the scores of the models it contains
atom.vote

Out[7]:

Voting
 --> Estimator: VotingClassifier
 --> Evaluation: accuracy: 0.9735

In [8]:

            
                Copied!
                
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]

Out[8]:

array([[4.47813955e-04, 9.99552186e-01],
       [9.92088283e-01, 7.91171695e-03],
       [3.39080139e-01, 6.60919861e-01],
       [2.85159297e-02, 9.71484070e-01],
       [9.99855077e-01, 1.44923265e-04],
       [3.71524281e-04, 9.99628476e-01],
       [1.20576580e-04, 9.99879423e-01],
       [1.28230320e-03, 9.98717697e-01],
       [9.99989407e-01, 1.05926032e-05],
       [8.72513041e-04, 9.99127487e-01]])

In [9]:

            
                Copied!
                
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])

In [10]:

            
                Copied!
                
atom.plot_results()
atom.plot_results()

In [11]:

            
                Copied!
                
atom.vote.delete()
atom.vote.delete()

Model Vote successfully deleted.

Stacking¶

In [12]:

            
                Copied!
                
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")


Results for Stacking:
Fit ---------------------------------------------
Train evaluation --> accuracy: 0.9934
Test evaluation --> accuracy: 0.9912
Time elapsed: 0.656s

In [13]:

            
                Copied!
                
# The final estimator uses the predictions of the underlying models
atom.stack.head()
# The final estimator uses the predictions of the underlying models
atom.stack.head()

Out[13]:

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	...	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension	target
0	14.22	23.12	94.37	609.9	0.10750	0.24130	0.19810	0.06618	0.2384	0.07542	...	37.18	106.40	762.4	0.1533	0.9327	0.84880	0.17720	0.5166	0.14460	0
1	18.46	18.52	121.10	1075.0	0.09874	0.10530	0.13350	0.08795	0.2132	0.06022	...	27.68	152.20	1603.0	0.1398	0.2089	0.31570	0.16420	0.3695	0.08579	0
2	13.40	20.52	88.64	556.7	0.11060	0.14690	0.14450	0.08172	0.2116	0.07325	...	29.66	113.30	844.4	0.1574	0.3856	0.51060	0.20510	0.3585	0.11090	0
3	14.71	21.59	95.55	656.9	0.11370	0.13650	0.12930	0.08123	0.2027	0.06758	...	30.70	115.70	985.5	0.1368	0.4290	0.35870	0.18340	0.3698	0.10940	0
4	11.43	17.31	73.66	398.0	0.10920	0.09486	0.02031	0.01861	0.1645	0.06562	...	26.76	82.66	503.0	0.1413	0.1792	0.07708	0.06402	0.2584	0.08096	1

5 rows × 31 columns

In [14]:

            
                Copied!
                
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)

Out[14]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
       1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])

In [15]:

            
                Copied!
                
atom.stack.beeswarm_plot(show=10)
atom.stack.beeswarm_plot(show=10)