Ensembles¶
This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
<< ================== ATOM ================== >> Algorithm task: binary classification. Dataset stats ====================== >> Shape: (569, 31) Scaled: False Outlier values: 174 (1.2%) --------------------------------------- Train set size: 456 Test set size: 113 --------------------------------------- | | dataset | train | test | |---:|:----------|:----------|:---------| | 0 | 212 (1.0) | 167 (1.0) | 45 (1.0) | | 1 | 357 (1.7) | 289 (1.7) | 68 (1.5) | Training ===================================== >> Models: LR, Tree, LGB Metric: accuracy Results for Logistic Regression: Fit --------------------------------------------- Train evaluation --> accuracy: 0.9912 Test evaluation --> accuracy: 0.9646 Time elapsed: 0.031s ------------------------------------------------- Total time: 0.031s Results for Decision Tree: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9469 Time elapsed: 0.000s ------------------------------------------------- Total time: 0.000s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9735 Time elapsed: 0.145s ------------------------------------------------- Total time: 0.146s Final results ========================= >> Duration: 0.178s ------------------------------------------ Logistic Regression --> accuracy: 0.9646 Decision Tree --> accuracy: 0.9469 LightGBM --> accuracy: 0.9735 !
Voting¶
In [4]:
Copied!
# Combine the models into a Voting model
atom.voting()
# Combine the models into a Voting model
atom.voting()
Voting added to the models!
In [5]:
Copied!
# Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models
Out[5]:
['LR', 'Tree', 'LGB', 'Vote']
In [6]:
Copied!
# The Vote model averages the scores of the models it contains
atom.vote
# The Vote model averages the scores of the models it contains
atom.vote
Out[6]:
Voting --> Models: ['LR', 'Tree', 'LGB'] --> Weights: None --> Evaluation: accuracy: 0.9617
In [7]:
Copied!
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
Out[7]:
array([[4.19991299e-01, 5.80008701e-01], [6.48864108e-05, 9.99935114e-01], [4.27978923e-05, 9.99957202e-01], [9.99989219e-01, 1.07807315e-05], [3.05053115e-03, 9.96949469e-01], [9.98028247e-01, 1.97175259e-03], [3.49327649e-03, 9.96506724e-01], [2.49145998e-04, 9.99750854e-01], [2.43109249e-05, 9.99975689e-01], [3.27725335e-04, 9.99672275e-01]])
In [8]:
Copied!
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
In [9]:
Copied!
atom.plot_results()
atom.plot_results()
In [10]:
Copied!
atom.vote.delete()
atom.vote.delete()
Model deleted successfully!
Stacking¶
In [11]:
Copied!
# Just like Voting, we can create a Stacking model
atom.stacking(estimator="LDA")
# Just like Voting, we can create a Stacking model
atom.stacking(estimator="LDA")
Stacking added to the models!
In [12]:
Copied!
# The final estimator uses the predictions of the underlying models
atom.stack.head()
# The final estimator uses the predictions of the underlying models
atom.stack.head()
Out[12]:
predict_proba_LR | predict_proba_Tree | predict_proba_LGB | Target | |
---|---|---|---|---|
0 | 0.944629 | 1.0 | 0.998580 | 1 |
1 | 0.006326 | 0.0 | 0.000161 | 0 |
2 | 0.992223 | 1.0 | 0.999963 | 1 |
3 | 0.003557 | 0.0 | 0.001161 | 0 |
4 | 0.047350 | 0.0 | 0.004474 | 0 |
In [13]:
Copied!
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
Out[13]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])
In [14]:
Copied!
atom.stack.beeswarm_plot()
atom.stack.beeswarm_plot()