Example: Successive halving¶

This example shows how to compare multiple tree-based models using successive halving.

Import the california housing dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.

Load the data¶

In [1]:

Copied!

from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor

In [2]:

Copied!

# Load the data
X, y = fetch_california_housing(return_X_y=True)
# Load the data
X, y = fetch_california_housing(return_X_y=True)

Run the pipeline¶

In [3]:

Copied!

atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Regression.

Dataset stats ==================== >>
Shape: (20640, 9)
Train set size: 16512
Test set size: 4128
-------------------------------------
Memory: 1.49 MB
Scaled: False
Outlier values: 786 (0.5%)

In [4]:

Copied!





# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)

Training ========================= >>
Metric: mae


Run: 0 =========================== >>
Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 16512 (17%)
Size of test set: 4128


Results for DecisionTree:
Fit ---------------------------------------------
Train evaluation --> mae: -0.0
Test evaluation --> mae: -0.5394
Time elapsed: 0.110s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.576 ± 0.0119
Time elapsed: 0.361s
-------------------------------------------------
Time: 0.472s


Results for Bagging:
Fit ---------------------------------------------
Train evaluation --> mae: -0.1715
Test evaluation --> mae: -0.4308
Time elapsed: 0.302s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.435 ± 0.0059
Time elapsed: 1.129s
-------------------------------------------------
Time: 1.431s


Results for ExtraTrees:
Fit ---------------------------------------------
Train evaluation --> mae: -0.0
Test evaluation --> mae: -0.3977
Time elapsed: 0.980s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.4059 ± 0.0028
Time elapsed: 4.264s
-------------------------------------------------
Time: 5.243s


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> mae: -0.1508
Test evaluation --> mae: -0.4053
Time elapsed: 2.362s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.4162 ± 0.0031
Time elapsed: 10.267s
-------------------------------------------------
Time: 12.629s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> mae: -0.2031
Test evaluation --> mae: -0.3594
Time elapsed: 0.430s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.3673 ± 0.0016
Time elapsed: 0.788s
-------------------------------------------------
Time: 1.218s


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> mae: -0.1639
Test evaluation --> mae: -0.3453
Time elapsed: 6.837s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.3543 ± 0.0025
Time elapsed: 01m:14s
-------------------------------------------------
Time: 01m:21s


Final results ==================== >>
Total time: 01m:42s
-------------------------------------
DecisionTree --> mae: -0.576 ± 0.0119 ~
Bagging      --> mae: -0.435 ± 0.0059 ~
ExtraTrees   --> mae: -0.4059 ± 0.0028 ~
RandomForest --> mae: -0.4162 ± 0.0031 ~
LightGBM     --> mae: -0.3673 ± 0.0016 ~
CatBoost     --> mae: -0.3543 ± 0.0025 ~ !


Run: 1 =========================== >>
Models: ET3, LGB3, CatB3
Size of training set: 16512 (33%)
Size of test set: 4128


Results for ExtraTrees:
Fit ---------------------------------------------
Train evaluation --> mae: -0.0
Test evaluation --> mae: -0.3739
Time elapsed: 1.755s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.3841 ± 0.0027
Time elapsed: 6.913s
-------------------------------------------------
Time: 8.668s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> mae: -0.2327
Test evaluation --> mae: -0.3356
Time elapsed: 0.484s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.345 ± 0.0037
Time elapsed: 0.942s
-------------------------------------------------
Time: 1.425s


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> mae: -0.1896
Test evaluation --> mae: -0.3276
Time elapsed: 5.944s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.3337 ± 0.0017
Time elapsed: 27.655s
-------------------------------------------------
Time: 33.599s


Final results ==================== >>
Total time: 43.699s
-------------------------------------
ExtraTrees --> mae: -0.3841 ± 0.0027 ~
LightGBM   --> mae: -0.345 ± 0.0037 ~
CatBoost   --> mae: -0.3337 ± 0.0017 ~ !


Run: 2 =========================== >>
Models: CatB1
Size of training set: 16512 (100%)
Size of test set: 4128


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> mae: -0.2231
Test evaluation --> mae: -0.3008
Time elapsed: 8.146s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.3079 ± 0.0029
Time elapsed: 40.905s
-------------------------------------------------
Time: 49.051s


Final results ==================== >>
Total time: 49.053s
-------------------------------------
CatBoost --> mae: -0.3079 ± 0.0029 ~

Analyze the results¶

In [5]:

Copied!





# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Out[5]:

		mae_train	mae_test	time_fit	mae_bootstrap	time_bootstrap	time
frac	model
0.17	Bag6	-0.2017	-0.4327	0.302116	-0.434981	1.129138	1.431254
	CatB6	-0.2043	-0.3560	6.837450	-0.354266	73.713478	80.550928
	ET6	-0.0694	-0.4077	0.979613	-0.405855	4.263598	5.243211
	LGB6	-0.2202	-0.3676	0.430176	-0.367271	0.788259	1.218435
	RF6	-0.1851	-0.4165	2.362080	-0.416217	10.267119	12.629199
	Tree6	-0.1039	-0.5897	0.110449	-0.575962	0.361372	0.471821
0.33	CatB3	-0.2249	-0.3341	5.943560	-0.333670	27.655384	33.598944
	ET3	-0.0935	-0.3879	1.755022	-0.384081	6.913167	8.668189
	LGB3	-0.2489	-0.3405	0.483790	-0.344951	0.941663	1.425453
1.00	CatB1	-0.2231	-0.3008	8.146296	-0.307908	40.904606	49.050902

In [6]:

Copied!

# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()

In [7]:

Copied!

# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])
# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])

In [8]:

Copied!

# ...or to call the models from the same run
atom.plot_errors(models=".*3")
# ...or to call the models from the same run
atom.plot_errors(models=".*3")