Example: Successive halving¶
This example shows how to compare multiple tree-based models using successive halving.
Import the california housing dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.
Load the data¶
In [1]:
Copied!
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor
In [2]:
Copied!
# Load the data
X, y = fetch_california_housing(return_X_y=True)
# Load the data
X, y = fetch_california_housing(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Regression. Dataset stats ==================== >> Shape: (20640, 9) Train set size: 16512 Test set size: 4128 ------------------------------------- Memory: 1.49 MB Scaled: False Outlier values: 786 (0.5%)
In [4]:
Copied!
# Compare tree-based models via successive halving
atom.successive_halving(
models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
metric="mae",
n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
metric="mae",
n_bootstrap=5,
)
Training ========================= >> Metric: mae Run: 0 =========================== >> Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6 Size of training set: 16512 (17%) Size of test set: 4128 Results for DecisionTree: Fit --------------------------------------------- Train evaluation --> mae: -0.0 Test evaluation --> mae: -0.5394 Time elapsed: 0.110s Bootstrap --------------------------------------- Evaluation --> mae: -0.576 ± 0.0119 Time elapsed: 0.361s ------------------------------------------------- Time: 0.472s Results for Bagging: Fit --------------------------------------------- Train evaluation --> mae: -0.1715 Test evaluation --> mae: -0.4308 Time elapsed: 0.302s Bootstrap --------------------------------------- Evaluation --> mae: -0.435 ± 0.0059 Time elapsed: 1.129s ------------------------------------------------- Time: 1.431s Results for ExtraTrees: Fit --------------------------------------------- Train evaluation --> mae: -0.0 Test evaluation --> mae: -0.3977 Time elapsed: 0.980s Bootstrap --------------------------------------- Evaluation --> mae: -0.4059 ± 0.0028 Time elapsed: 4.264s ------------------------------------------------- Time: 5.243s Results for RandomForest: Fit --------------------------------------------- Train evaluation --> mae: -0.1508 Test evaluation --> mae: -0.4053 Time elapsed: 2.362s Bootstrap --------------------------------------- Evaluation --> mae: -0.4162 ± 0.0031 Time elapsed: 10.267s ------------------------------------------------- Time: 12.629s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> mae: -0.2031 Test evaluation --> mae: -0.3594 Time elapsed: 0.430s Bootstrap --------------------------------------- Evaluation --> mae: -0.3673 ± 0.0016 Time elapsed: 0.788s ------------------------------------------------- Time: 1.218s Results for CatBoost: Fit --------------------------------------------- Train evaluation --> mae: -0.1639 Test evaluation --> mae: -0.3453 Time elapsed: 6.837s Bootstrap --------------------------------------- Evaluation --> mae: -0.3543 ± 0.0025 Time elapsed: 01m:14s ------------------------------------------------- Time: 01m:21s Final results ==================== >> Total time: 01m:42s ------------------------------------- DecisionTree --> mae: -0.576 ± 0.0119 ~ Bagging --> mae: -0.435 ± 0.0059 ~ ExtraTrees --> mae: -0.4059 ± 0.0028 ~ RandomForest --> mae: -0.4162 ± 0.0031 ~ LightGBM --> mae: -0.3673 ± 0.0016 ~ CatBoost --> mae: -0.3543 ± 0.0025 ~ ! Run: 1 =========================== >> Models: ET3, LGB3, CatB3 Size of training set: 16512 (33%) Size of test set: 4128 Results for ExtraTrees: Fit --------------------------------------------- Train evaluation --> mae: -0.0 Test evaluation --> mae: -0.3739 Time elapsed: 1.755s Bootstrap --------------------------------------- Evaluation --> mae: -0.3841 ± 0.0027 Time elapsed: 6.913s ------------------------------------------------- Time: 8.668s Results for LightGBM: Fit --------------------------------------------- Train evaluation --> mae: -0.2327 Test evaluation --> mae: -0.3356 Time elapsed: 0.484s Bootstrap --------------------------------------- Evaluation --> mae: -0.345 ± 0.0037 Time elapsed: 0.942s ------------------------------------------------- Time: 1.425s Results for CatBoost: Fit --------------------------------------------- Train evaluation --> mae: -0.1896 Test evaluation --> mae: -0.3276 Time elapsed: 5.944s Bootstrap --------------------------------------- Evaluation --> mae: -0.3337 ± 0.0017 Time elapsed: 27.655s ------------------------------------------------- Time: 33.599s Final results ==================== >> Total time: 43.699s ------------------------------------- ExtraTrees --> mae: -0.3841 ± 0.0027 ~ LightGBM --> mae: -0.345 ± 0.0037 ~ CatBoost --> mae: -0.3337 ± 0.0017 ~ ! Run: 2 =========================== >> Models: CatB1 Size of training set: 16512 (100%) Size of test set: 4128 Results for CatBoost: Fit --------------------------------------------- Train evaluation --> mae: -0.2231 Test evaluation --> mae: -0.3008 Time elapsed: 8.146s Bootstrap --------------------------------------- Evaluation --> mae: -0.3079 ± 0.0029 Time elapsed: 40.905s ------------------------------------------------- Time: 49.051s Final results ==================== >> Total time: 49.053s ------------------------------------- CatBoost --> mae: -0.3079 ± 0.0029 ~
Analyze the results¶
In [5]:
Copied!
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
Out[5]:
mae_train | mae_test | time_fit | mae_bootstrap | time_bootstrap | time | ||
---|---|---|---|---|---|---|---|
frac | model | ||||||
0.17 | Bag6 | -0.2017 | -0.4327 | 0.302116 | -0.434981 | 1.129138 | 1.431254 |
CatB6 | -0.2043 | -0.3560 | 6.837450 | -0.354266 | 73.713478 | 80.550928 | |
ET6 | -0.0694 | -0.4077 | 0.979613 | -0.405855 | 4.263598 | 5.243211 | |
LGB6 | -0.2202 | -0.3676 | 0.430176 | -0.367271 | 0.788259 | 1.218435 | |
RF6 | -0.1851 | -0.4165 | 2.362080 | -0.416217 | 10.267119 | 12.629199 | |
Tree6 | -0.1039 | -0.5897 | 0.110449 | -0.575962 | 0.361372 | 0.471821 | |
0.33 | CatB3 | -0.2249 | -0.3341 | 5.943560 | -0.333670 | 27.655384 | 33.598944 |
ET3 | -0.0935 | -0.3879 | 1.755022 | -0.384081 | 6.913167 | 8.668189 | |
LGB3 | -0.2489 | -0.3405 | 0.483790 | -0.344951 | 0.941663 | 1.425453 | |
1.00 | CatB1 | -0.2231 | -0.3008 | 8.146296 | -0.307908 | 40.904606 | 49.050902 |
In [6]:
Copied!
# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()
In [7]:
Copied!
# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])
# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])
In [8]:
Copied!
# ...or to call the models from the same run
atom.plot_errors(models=".*3")
# ...or to call the models from the same run
atom.plot_errors(models=".*3")