Ensembles¶
This example shows how to use atom's ensemble techniques to improve predictions on a dataset combining several models.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
                Copied!
                
                
            # Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
    
        In [2]:
                Copied!
                
                
            # Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
    
        Run the pipeline¶
In [3]:
                Copied!
                
                
            # Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
# Initialize atom and train several models
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom.run(models=["LR", "Tree", "LGB"], metric="accuracy")
    
        << ================== ATOM ================== >> Algorithm task: binary classification. Dataset stats ==================== >> Shape: (569, 31) Memory: 138.96 kB Scaled: False Outlier values: 169 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) | Training ========================= >> Models: LR, Tree, LGB Metric: accuracy Results for Logistic Regression: Fit --------------------------------------------- Train evaluation --> accuracy: 0.9868 Test evaluation --> accuracy: 0.9912 Time elapsed: 0.016s ------------------------------------------------- Total time: 0.016s Results for Decision Tree: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9646 Time elapsed: 0.016s ------------------------------------------------- Total time: 0.016s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9823 Time elapsed: 0.234s ------------------------------------------------- Total time: 0.234s Final results ==================== >> Duration: 0.266s ------------------------------------- Logistic Regression --> accuracy: 0.9912 ! Decision Tree --> accuracy: 0.9646 LightGBM --> accuracy: 0.9823
Voting¶
In [4]:
                Copied!
                
                
            # Combine the models into a Voting model
atom.voting(voting="soft")
# Combine the models into a Voting model
atom.voting(voting="soft")
    
        Results for Voting: Fit --------------------------------------------- Train evaluation --> accuracy: 1.0 Test evaluation --> accuracy: 0.9735 Time elapsed: 0.016s
In [5]:
                Copied!
                
                
            # Note that we now have an extra model in the pipeline
atom.models
# Note that we now have an extra model in the pipeline
atom.models
    
        Out[5]:
['LR', 'Tree', 'LGB', 'Vote']
In [6]:
                Copied!
                
                
            # The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
# The plot_pipeline method helps us visualize the ensemble
atom.plot_pipeline()
    
        In [7]:
                Copied!
                
                
            # The Vote model averages the scores of the models it contains
atom.vote
# The Vote model averages the scores of the models it contains
atom.vote
    
        Out[7]:
Voting --> Estimator: VotingClassifier --> Evaluation: accuracy: 0.9735
In [8]:
                Copied!
                
                
            # We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
# We can use it like any other model to make predictions or plots
atom.vote.predict_proba_test[:10]
    
        Out[8]:
array([[4.47813955e-04, 9.99552186e-01],
       [9.92088283e-01, 7.91171695e-03],
       [3.39080139e-01, 6.60919861e-01],
       [2.85159297e-02, 9.71484070e-01],
       [9.99855077e-01, 1.44923265e-04],
       [3.71524281e-04, 9.99628476e-01],
       [1.20576580e-04, 9.99879423e-01],
       [1.28230320e-03, 9.98717697e-01],
       [9.99989407e-01, 1.05926032e-05],
       [8.72513041e-04, 9.99127487e-01]])
In [9]:
                Copied!
                
                
            atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
atom.vote.plot_threshold(metric=["auc", "recall", "accuracy"])
    
        In [10]:
                Copied!
                
                
            atom.plot_results()
atom.plot_results()
    
        In [11]:
                Copied!
                
                
            atom.vote.delete()
atom.vote.delete()
    
        Model Vote successfully deleted.
Stacking¶
In [12]:
                Copied!
                
                
            # Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
# Just like Voting, we can create a Stacking model
atom.stacking(final_estimator="LDA")
    
        Results for Stacking: Fit --------------------------------------------- Train evaluation --> accuracy: 0.9934 Test evaluation --> accuracy: 0.9912 Time elapsed: 0.656s
In [13]:
                Copied!
                
                
            # The final estimator uses the predictions of the underlying models
atom.stack.head()
# The final estimator uses the predictions of the underlying models
atom.stack.head()
    
        Out[13]:
| mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 14.22 | 23.12 | 94.37 | 609.9 | 0.10750 | 0.24130 | 0.19810 | 0.06618 | 0.2384 | 0.07542 | ... | 37.18 | 106.40 | 762.4 | 0.1533 | 0.9327 | 0.84880 | 0.17720 | 0.5166 | 0.14460 | 0 | 
| 1 | 18.46 | 18.52 | 121.10 | 1075.0 | 0.09874 | 0.10530 | 0.13350 | 0.08795 | 0.2132 | 0.06022 | ... | 27.68 | 152.20 | 1603.0 | 0.1398 | 0.2089 | 0.31570 | 0.16420 | 0.3695 | 0.08579 | 0 | 
| 2 | 13.40 | 20.52 | 88.64 | 556.7 | 0.11060 | 0.14690 | 0.14450 | 0.08172 | 0.2116 | 0.07325 | ... | 29.66 | 113.30 | 844.4 | 0.1574 | 0.3856 | 0.51060 | 0.20510 | 0.3585 | 0.11090 | 0 | 
| 3 | 14.71 | 21.59 | 95.55 | 656.9 | 0.11370 | 0.13650 | 0.12930 | 0.08123 | 0.2027 | 0.06758 | ... | 30.70 | 115.70 | 985.5 | 0.1368 | 0.4290 | 0.35870 | 0.18340 | 0.3698 | 0.10940 | 0 | 
| 4 | 11.43 | 17.31 | 73.66 | 398.0 | 0.10920 | 0.09486 | 0.02031 | 0.01861 | 0.1645 | 0.06562 | ... | 26.76 | 82.66 | 503.0 | 0.1413 | 0.1792 | 0.07708 | 0.06402 | 0.2584 | 0.08096 | 1 | 
5 rows × 31 columns
In [14]:
                Copied!
                
                
            # Again, the model can be used for predictions or plots
atom.stack.predict(X)
# Again, the model can be used for predictions or plots
atom.stack.predict(X)
    
        Out[14]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
       1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])
In [15]:
                Copied!
                
                
            atom.stack.beeswarm_plot(show=10)
atom.stack.beeswarm_plot(show=10)