Successive halving¶

This example shows how to compare multiple tree-based models using successive halving.

Import the boston dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.

Load the data¶

In [1]:

            
                Copied!
                
from sklearn.datasets import load_boston
from atom import ATOMRegressor
from sklearn.datasets import load_boston
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load the data
X, y = load_boston(return_X_y=True)
# Load the data
X, y = load_boston(return_X_y=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ====================== >>
Shape: (506, 14)
Scaled: False
Outlier values: 82 (1.4%)
---------------------------------------
Train set size: 405
Test set size: 101

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)

Training ===================================== >>
Metric: neg_mean_absolute_error


Run: 0 ================================ >>
Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 405 (17%)
Size of test set: 101


Results for Decision Tree:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -3.3257
Time elapsed: 0.005s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -4.3307 ± 0.525
Time elapsed: 0.023s
-------------------------------------------------
Total time: 0.028s


Results for Bagging Regressor:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -1.3054
Test evaluation --> neg_mean_absolute_error: -2.695
Time elapsed: 0.020s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -3.0957 ± 0.2677
Time elapsed: 0.081s
-------------------------------------------------
Total time: 0.101s


Results for Extra-Trees:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -2.1541
Time elapsed: 0.084s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.5554 ± 0.1708
Time elapsed: 0.353s
-------------------------------------------------
Total time: 0.436s


Results for Random Forest:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -1.1509
Test evaluation --> neg_mean_absolute_error: -2.4143
Time elapsed: 0.109s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.9574 ± 0.2253
Time elapsed: 0.508s
-------------------------------------------------
Total time: 0.617s


Results for LightGBM:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -3.3965
Test evaluation --> neg_mean_absolute_error: -4.4873
Time elapsed: 0.026s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -4.8485 ± 0.2679
Time elapsed: 0.057s
-------------------------------------------------
Total time: 0.083s


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0806
Test evaluation --> neg_mean_absolute_error: -2.3991
Time elapsed: 1.252s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.9193 ± 0.2604
Time elapsed: 3.901s
-------------------------------------------------
Total time: 5.153s


Final results ========================= >>
Duration: 6.419s
------------------------------------------
Decision Tree     --> neg_mean_absolute_error: -4.3307 ± 0.525 ~
Bagging Regressor --> neg_mean_absolute_error: -3.0957 ± 0.2677 ~
Extra-Trees       --> neg_mean_absolute_error: -2.5554 ± 0.1708 ~ !
Random Forest     --> neg_mean_absolute_error: -2.9574 ± 0.2253 ~
LightGBM          --> neg_mean_absolute_error: -4.8485 ± 0.2679 ~
CatBoost          --> neg_mean_absolute_error: -2.9193 ± 0.2604 ~


Run: 1 ================================ >>
Models: ET3, RF3, CatB3
Size of training set: 405 (33%)
Size of test set: 101


Results for Extra-Trees:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -2.2361
Time elapsed: 0.094s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.6016 ± 0.289
Time elapsed: 0.404s
-------------------------------------------------
Total time: 0.499s


Results for Random Forest:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.982
Test evaluation --> neg_mean_absolute_error: -2.5055
Time elapsed: 0.126s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.7619 ± 0.1947
Time elapsed: 0.566s
-------------------------------------------------
Total time: 0.692s


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.2835
Test evaluation --> neg_mean_absolute_error: -2.42
Time elapsed: 1.740s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.5595 ± 0.2768
Time elapsed: 6.480s
-------------------------------------------------
Total time: 8.220s


Final results ========================= >>
Duration: 9.411s
------------------------------------------
Extra-Trees   --> neg_mean_absolute_error: -2.6016 ± 0.289 ~
Random Forest --> neg_mean_absolute_error: -2.7619 ± 0.1947 ~
CatBoost      --> neg_mean_absolute_error: -2.5595 ± 0.2768 ~ !


Run: 2 ================================ >>
Models: CatB1
Size of training set: 405 (100%)
Size of test set: 101


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.3978
Test evaluation --> neg_mean_absolute_error: -1.8776
Time elapsed: 3.180s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -2.0515 ± 0.0902
Time elapsed: 14.450s
-------------------------------------------------
Total time: 17.630s


Final results ========================= >>
Duration: 17.631s
------------------------------------------
CatBoost --> neg_mean_absolute_error: -2.0515 ± 0.0902 ~

Analyze results¶

In [5]:

            
                Copied!
                
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Out[5]:

		metric_train	metric_test	time_fit	mean_bootstrap	std_bootstrap	time_bootstrap	time
frac	model
0.17	Bag6	-1.305373e+00	-2.695050	0.020s	-3.095663	0.267668	0.081s	0.101s
	CatB6	-8.055503e-02	-2.399073	1.252s	-2.919304	0.260378	3.901s	5.153s
	ET6	-2.256238e-14	-2.154089	0.084s	-2.555434	0.170823	0.353s	0.436s
	LGB6	-3.396511e+00	-4.487270	0.026s	-4.848536	0.267874	0.057s	0.083s
	RF6	-1.150866e+00	-2.414297	0.109s	-2.957400	0.225311	0.508s	0.617s
	Tree6	-0.000000e+00	-3.325743	0.005s	-4.330693	0.525026	0.023s	0.028s
0.33	CatB3	-2.835499e-01	-2.420032	1.740s	-2.559497	0.276791	6.480s	8.220s
	ET3	-2.315185e-14	-2.236079	0.094s	-2.601648	0.289034	0.404s	0.499s
	RF3	-9.819778e-01	-2.505465	0.126s	-2.761887	0.194678	0.566s	0.692s
1.00	CatB1	-3.977985e-01	-1.877590	3.180s	-2.051462	0.090227	14.450s	17.630s

In [6]:

            
                Copied!
                
# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()

In [7]:

            
                Copied!
                
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])

In [8]:

            
                Copied!
                
# Use the number to call the models from the same run
atom.plot_errors(models="3")
# Use the number to call the models from the same run
atom.plot_errors(models="3")