Ensembles¶
This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
Run the pipeline¶
In [3]:
Copied!
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
<< ================== ATOM ================== >> Algorithm task: binary classification. Dataset stats ==================== >> Shape: (569, 31) Memory: 138.96 kB Scaled: False Outlier values: 169 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) | Training ========================= >> Models: LR, Tree, LGB Metric: accuracy Results for Logistic Regression: Fit --------------------------------------------- Train evaluation --> accuracy: 0.9868 Test evaluation --> accuracy: 0.9912 Time elapsed: 0.016s ------------------------------------------------- Total time: 0.016s Results for Decision Tree: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9646 Time elapsed: 0.016s ------------------------------------------------- Total time: 0.016s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9823 Time elapsed: 0.234s ------------------------------------------------- Total time: 0.234s Final results ==================== >> Duration: 0.266s ------------------------------------- Logistic Regression --> accuracy: 0.9912 ! Decision Tree --> accuracy: 0.9646 LightGBM --> accuracy: 0.9823
Voting¶
In [4]:
Copied!
# Combine the models into a Voting model
atom.voting(voting="soft")
# Combine the models into a Voting model
atom.voting(voting="soft")
Results for Voting: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9735 Time elapsed: 0.016s
In [5]:
Copied!
# Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models
Out[5]:
['LR', 'Tree', 'LGB', 'Vote']
In [6]:
Copied!
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
In [7]:
Copied!
# The Vote model averages the scores of the models it contains
atom.vote
# The Vote model averages the scores of the models it contains
atom.vote
Out[7]:
Voting --> Estimator: VotingClassifier --> Evaluation: accuracy: 0.9735
In [8]:
Copied!
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
Out[8]:
array([[4.47813955e-04, 9.99552186e-01], [9.92088283e-01, 7.91171695e-03], [3.39080139e-01, 6.60919861e-01], [2.85159297e-02, 9.71484070e-01], [9.99855077e-01, 1.44923265e-04], [3.71524281e-04, 9.99628476e-01], [1.20576580e-04, 9.99879423e-01], [1.28230320e-03, 9.98717697e-01], [9.99989407e-01, 1.05926032e-05], [8.72513041e-04, 9.99127487e-01]])
In [9]:
Copied!
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
In [10]:
Copied!
atom.plot_results()
atom.plot_results()
In [11]:
Copied!
atom.vote.delete()
atom.vote.delete()
Model Vote successfully deleted.
Stacking¶
In [12]:
Copied!
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
Results for Stacking: Fit --------------------------------------------- Train evaluation --> accuracy: 0.9934 Test evaluation --> accuracy: 0.9912 Time elapsed: 0.656s
In [13]:
Copied!
# The final estimator uses the predictions of the underlying models
atom.stack.head()
# The final estimator uses the predictions of the underlying models
atom.stack.head()
Out[13]:
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14.22 | 23.12 | 94.37 | 609.9 | 0.10750 | 0.24130 | 0.19810 | 0.06618 | 0.2384 | 0.07542 | ... | 37.18 | 106.40 | 762.4 | 0.1533 | 0.9327 | 0.84880 | 0.17720 | 0.5166 | 0.14460 | 0 |
1 | 18.46 | 18.52 | 121.10 | 1075.0 | 0.09874 | 0.10530 | 0.13350 | 0.08795 | 0.2132 | 0.06022 | ... | 27.68 | 152.20 | 1603.0 | 0.1398 | 0.2089 | 0.31570 | 0.16420 | 0.3695 | 0.08579 | 0 |
2 | 13.40 | 20.52 | 88.64 | 556.7 | 0.11060 | 0.14690 | 0.14450 | 0.08172 | 0.2116 | 0.07325 | ... | 29.66 | 113.30 | 844.4 | 0.1574 | 0.3856 | 0.51060 | 0.20510 | 0.3585 | 0.11090 | 0 |
3 | 14.71 | 21.59 | 95.55 | 656.9 | 0.11370 | 0.13650 | 0.12930 | 0.08123 | 0.2027 | 0.06758 | ... | 30.70 | 115.70 | 985.5 | 0.1368 | 0.4290 | 0.35870 | 0.18340 | 0.3698 | 0.10940 | 0 |
4 | 11.43 | 17.31 | 73.66 | 398.0 | 0.10920 | 0.09486 | 0.02031 | 0.01861 | 0.1645 | 0.06562 | ... | 26.76 | 82.66 | 503.0 | 0.1413 | 0.1792 | 0.07708 | 0.06402 | 0.2584 | 0.08096 | 1 |
5 rows × 31 columns
In [14]:
Copied!
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
Out[14]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])
In [15]:
Copied!
atom.stack.beeswarm_plot(show=10)
atom.stack.beeswarm_plot(show=10)