Successive halving¶
This example shows how to compare multiple tree-based models using successive halving.
Import the boston dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.
Load the data¶
In [1]:
Copied!
from sklearn.datasets import load_boston
from atom import ATOMRegressor
from sklearn.datasets import load_boston
from atom import ATOMRegressor
In [2]:
Copied!
# Load the data
X, y = load_boston(return_X_y=True)
# Load the data
X, y = load_boston(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ====================== >> Shape: (506, 14) Scaled: False Outlier values: 82 (1.4%) --------------------------------------- Train set size: 405 Test set size: 101
In [4]:
Copied!
# Compare tree-based models via successive halving
atom.successive_halving(
models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
metric="mae",
n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
metric="mae",
n_bootstrap=5,
)
Training ===================================== >> Metric: neg_mean_absolute_error Run: 0 ================================ >> Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6 Size of training set: 405 (17%) Size of test set: 101 Results for Decision Tree: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.0 Test evaluation --> neg_mean_absolute_error: -3.3257 Time elapsed: 0.005s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -4.3307 ± 0.525 Time elapsed: 0.023s ------------------------------------------------- Total time: 0.028s Results for Bagging Regressor: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -1.3054 Test evaluation --> neg_mean_absolute_error: -2.695 Time elapsed: 0.020s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -3.0957 ± 0.2677 Time elapsed: 0.081s ------------------------------------------------- Total time: 0.101s Results for Extra-Trees: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.0 Test evaluation --> neg_mean_absolute_error: -2.1541 Time elapsed: 0.084s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.5554 ± 0.1708 Time elapsed: 0.353s ------------------------------------------------- Total time: 0.436s Results for Random Forest: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -1.1509 Test evaluation --> neg_mean_absolute_error: -2.4143 Time elapsed: 0.109s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.9574 ± 0.2253 Time elapsed: 0.508s ------------------------------------------------- Total time: 0.617s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -3.3965 Test evaluation --> neg_mean_absolute_error: -4.4873 Time elapsed: 0.026s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -4.8485 ± 0.2679 Time elapsed: 0.057s ------------------------------------------------- Total time: 0.083s Results for CatBoost: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.0806 Test evaluation --> neg_mean_absolute_error: -2.3991 Time elapsed: 1.252s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.9193 ± 0.2604 Time elapsed: 3.901s ------------------------------------------------- Total time: 5.153s Final results ========================= >> Duration: 6.419s ------------------------------------------ Decision Tree --> neg_mean_absolute_error: -4.3307 ± 0.525 ~ Bagging Regressor --> neg_mean_absolute_error: -3.0957 ± 0.2677 ~ Extra-Trees --> neg_mean_absolute_error: -2.5554 ± 0.1708 ~ ! Random Forest --> neg_mean_absolute_error: -2.9574 ± 0.2253 ~ LightGBM --> neg_mean_absolute_error: -4.8485 ± 0.2679 ~ CatBoost --> neg_mean_absolute_error: -2.9193 ± 0.2604 ~ Run: 1 ================================ >> Models: ET3, RF3, CatB3 Size of training set: 405 (33%) Size of test set: 101 Results for Extra-Trees: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.0 Test evaluation --> neg_mean_absolute_error: -2.2361 Time elapsed: 0.094s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.6016 ± 0.289 Time elapsed: 0.404s ------------------------------------------------- Total time: 0.499s Results for Random Forest: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.982 Test evaluation --> neg_mean_absolute_error: -2.5055 Time elapsed: 0.126s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.7619 ± 0.1947 Time elapsed: 0.566s ------------------------------------------------- Total time: 0.692s Results for CatBoost: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.2835 Test evaluation --> neg_mean_absolute_error: -2.42 Time elapsed: 1.740s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.5595 ± 0.2768 Time elapsed: 6.480s ------------------------------------------------- Total time: 8.220s Final results ========================= >> Duration: 9.411s ------------------------------------------ Extra-Trees --> neg_mean_absolute_error: -2.6016 ± 0.289 ~ Random Forest --> neg_mean_absolute_error: -2.7619 ± 0.1947 ~ CatBoost --> neg_mean_absolute_error: -2.5595 ± 0.2768 ~ ! Run: 2 ================================ >> Models: CatB1 Size of training set: 405 (100%) Size of test set: 101 Results for CatBoost: Fit --------------------------------------------- Train evaluation --> neg_mean_absolute_error: -0.3978 Test evaluation --> neg_mean_absolute_error: -1.8776 Time elapsed: 3.180s Bootstrap --------------------------------------- Evaluation --> neg_mean_absolute_error: -2.0515 ± 0.0902 Time elapsed: 14.450s ------------------------------------------------- Total time: 17.630s Final results ========================= >> Duration: 17.631s ------------------------------------------ CatBoost --> neg_mean_absolute_error: -2.0515 ± 0.0902 ~
Analyze results¶
In [5]:
Copied!
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
Out[5]:
metric_train | metric_test | time_fit | mean_bootstrap | std_bootstrap | time_bootstrap | time | ||
---|---|---|---|---|---|---|---|---|
frac | model | |||||||
0.17 | Bag6 | -1.305373e+00 | -2.695050 | 0.020s | -3.095663 | 0.267668 | 0.081s | 0.101s |
CatB6 | -8.055503e-02 | -2.399073 | 1.252s | -2.919304 | 0.260378 | 3.901s | 5.153s | |
ET6 | -2.256238e-14 | -2.154089 | 0.084s | -2.555434 | 0.170823 | 0.353s | 0.436s | |
LGB6 | -3.396511e+00 | -4.487270 | 0.026s | -4.848536 | 0.267874 | 0.057s | 0.083s | |
RF6 | -1.150866e+00 | -2.414297 | 0.109s | -2.957400 | 0.225311 | 0.508s | 0.617s | |
Tree6 | -0.000000e+00 | -3.325743 | 0.005s | -4.330693 | 0.525026 | 0.023s | 0.028s | |
0.33 | CatB3 | -2.835499e-01 | -2.420032 | 1.740s | -2.559497 | 0.276791 | 6.480s | 8.220s |
ET3 | -2.315185e-14 | -2.236079 | 0.094s | -2.601648 | 0.289034 | 0.404s | 0.499s | |
RF3 | -9.819778e-01 | -2.505465 | 0.126s | -2.761887 | 0.194678 | 0.566s | 0.692s | |
1.00 | CatB1 | -3.977985e-01 | -1.877590 | 3.180s | -2.051462 | 0.090227 | 14.450s | 17.630s |
In [6]:
Copied!
# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()
In [7]:
Copied!
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])
In [8]:
Copied!
# Use the number to call the models from the same run
atom.plot_errors(models="3")
# Use the number to call the models from the same run
atom.plot_errors(models="3")