Early stopping¶

This example shows how to use early stopping to reduce the time it takes to run a pipeline. This option is only available for models that allow in-training evaluation (XGBoost, LightGBM and CatBoost).

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

In [2]:

            
                Copied!
                
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=2, verbose=2, warnings=False, random_state=1)
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=2, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.
Parallel processing with 2 cores.

Dataset stats ==================== >>
Shape: (569, 31)
Memory: 138.96 kB
Scaled: False
Outlier values: 169 (1.2%)
-------------------------------------
Train set size: 456
Test set size: 113
-------------------------------------
|   |     dataset |       train |        test |
| - | ----------- | ----------- | ----------- |
| 0 |   212 (1.0) |   170 (1.0) |    42 (1.0) |
| 1 |   357 (1.7) |   286 (1.7) |    71 (1.7) |

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# Train the models using early stopping. An early stopping value of 0.1 means
# that the model will stop if it didn't improve in the last 10% of it's iterations
atom.run(
    models="LGB",
    metric="ap",
    n_calls=7,
    n_initial_points=3,
    bo_params={"early_stopping": 0.1},
)
# Train the models using early stopping. An early stopping value of 0.1 means
# that the model will stop if it didn't improve in the last 10% of it's iterations
atom.run(
    models="LGB",
    metric="ap",
    n_calls=7,
    n_initial_points=3,
    bo_params={"early_stopping": 0.1},
)

Training ========================= >>
Models: LGB
Metric: average_precision


Running BO for LightGBM...
| call             | n_estimators | learning_rate | max_depth | num_leaves | min_child_weight | min_child_samples | subsample | colsample_bytree | reg_alpha | reg_lambda | average_precision | best_average_precision | early_stopping |    time | total_time |
| ---------------- | ------------ | ------------- | --------- | ---------- | ---------------- | ----------------- | --------- | ---------------- | --------- | ---------- | ----------------- | ---------------------- | -------------- | ------- | ---------- |
| Initial point 1  |          499 |         0.733 |         1 |         40 |            0.001 |                18 |       0.7 |              0.8 |       100 |         10 |            0.6264 |                 0.6264 |         50/499 |  0.086s |     0.121s |
| Initial point 2  |          170 |         0.112 |         4 |         25 |              0.1 |                28 |       0.7 |              0.7 |       100 |         10 |            0.6264 |                 0.6264 |         18/170 |  0.045s |     0.658s |
| Initial point 3  |          364 |        0.4032 |         1 |         30 |               10 |                27 |       0.9 |              0.6 |         0 |          1 |            0.9923 |                 0.9923 |         49/364 |  0.065s |     0.882s |
| Iteration 4      |          280 |        0.4959 |         2 |         30 |              0.1 |                19 |       0.9 |              0.5 |         0 |        0.1 |            0.9994 |                 0.9994 |         94/280 |  0.027s |     2.810s |
| Iteration 5      |           81 |          0.81 |         4 |         30 |           0.0001 |                10 |       0.9 |              0.4 |         0 |          0 |            0.9956 |                 0.9994 |          22/81 |  0.050s |     3.264s |
| Iteration 6      |           72 |        0.1255 |         4 |         21 |           0.0001 |                14 |       0.9 |              0.4 |      0.01 |       0.01 |             0.998 |                 0.9994 |          72/72 |  0.062s |     4.114s |
| Iteration 7      |           69 |        0.1692 |         6 |         26 |               10 |                14 |       0.5 |              0.8 |      0.01 |          1 |            0.9805 |                 0.9994 |          25/69 |  0.059s |     4.847s |
Bayesian Optimization ---------------------------
Best call --> Iteration 4
Best parameters --> {'n_estimators': 280, 'learning_rate': 0.4959, 'max_depth': 2, 'num_leaves': 30, 'min_child_weight': 0.1, 'min_child_samples': 19, 'subsample': 0.9, 'colsample_bytree': 0.5, 'reg_alpha': 0, 'reg_lambda': 0.1}
Best evaluation --> average_precision: 0.9994
Time elapsed: 5.687s
Fit ---------------------------------------------
Early stop at iteration 73 of 280.
Train evaluation --> average_precision: 1.0
Test evaluation --> average_precision: 0.9974
Time elapsed: 0.078s
-------------------------------------------------
Total time: 5.768s


Final results ==================== >>
Duration: 5.771s
-------------------------------------
LightGBM --> average_precision: 0.9974

Analyze the results¶

In [5]:

            
                Copied!
                
# Plot the evaluation on the train and test set during training
# Note that the metric is provided by the estimator's package, not ATOM!
atom.lgb.plot_evals(title="LightGBM's evaluation curve", figsize=(11, 9))
# Plot the evaluation on the train and test set during training
# Note that the metric is provided by the estimator's package, not ATOM!
atom.lgb.plot_evals(title="LightGBM's evaluation curve", figsize=(11, 9))