Successive halving¶

This example shows how to compare multiple tree-based models using successive halving.

Import the california housing dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.

Load the data¶

In [1]:

            
                Copied!
                
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load the data
X, y = fetch_california_housing(return_X_y=True)
# Load the data
X, y = fetch_california_housing(return_X_y=True)

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (20640, 9)
Scaled: False
Outlier values: 811 (0.5%)
-------------------------------------
Train set size: 16512
Test set size: 4128
-------------------------------------

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)

Training ========================= >>
Metric: neg_mean_absolute_error


Run: 0 ================================ >>
Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 16512 (17%)
Size of test set: 4128


Results for Decision Tree:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.564
Time elapsed: 0.027s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.5626 ± 0.0194
Time elapsed: 0.329s
-------------------------------------------------
Total time: 0.357s


Results for Bagging Regressor:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1716
Test evaluation --> neg_mean_absolute_error: -0.4253
Time elapsed: 0.151s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.4284 ± 0.0042
Time elapsed: 0.782s
-------------------------------------------------
Total time: 0.933s


Results for Extra-Trees:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.3859
Time elapsed: 0.613s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3921 ± 0.0015
Time elapsed: 2.845s
-------------------------------------------------
Total time: 3.460s


Results for Random Forest:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1491
Test evaluation --> neg_mean_absolute_error: -0.3998
Time elapsed: 1.154s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.4074 ± 0.003
Time elapsed: 5.683s
-------------------------------------------------
Total time: 6.840s


Results for LightGBM:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.2019
Test evaluation --> neg_mean_absolute_error: -0.35
Time elapsed: 0.184s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3602 ± 0.0031
Time elapsed: 0.760s
-------------------------------------------------
Total time: 0.945s


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1685
Test evaluation --> neg_mean_absolute_error: -0.3356
Time elapsed: 3.658s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3443 ± 0.0018
Time elapsed: 17.332s
-------------------------------------------------
Total time: 20.990s


Final results ==================== >>
Duration: 33.525s
-------------------------------------
Decision Tree     --> neg_mean_absolute_error: -0.5626 ± 0.0194 ~
Bagging Regressor --> neg_mean_absolute_error: -0.4284 ± 0.0042 ~
Extra-Trees       --> neg_mean_absolute_error: -0.3921 ± 0.0015 ~
Random Forest     --> neg_mean_absolute_error: -0.4074 ± 0.003 ~
LightGBM          --> neg_mean_absolute_error: -0.3602 ± 0.0031 ~
CatBoost          --> neg_mean_absolute_error: -0.3443 ± 0.0018 ~ !


Run: 1 ================================ >>
Models: ET3, LGB3, CatB3
Size of training set: 16512 (33%)
Size of test set: 4128


Results for Extra-Trees:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.3527
Time elapsed: 1.189s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3674 ± 0.0014
Time elapsed: 5.216s
-------------------------------------------------
Total time: 6.410s


Results for LightGBM:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.235
Test evaluation --> neg_mean_absolute_error: -0.326
Time elapsed: 0.237s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3353 ± 0.0018
Time elapsed: 1.038s
-------------------------------------------------
Total time: 1.275s


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1992
Test evaluation --> neg_mean_absolute_error: -0.3121
Time elapsed: 3.881s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3229 ± 0.0008
Time elapsed: 19.523s
-------------------------------------------------
Total time: 23.405s


Final results ==================== >>
Duration: 31.090s
-------------------------------------
Extra-Trees --> neg_mean_absolute_error: -0.3674 ± 0.0014 ~
LightGBM    --> neg_mean_absolute_error: -0.3353 ± 0.0018 ~
CatBoost    --> neg_mean_absolute_error: -0.3229 ± 0.0008 ~ !


Run: 2 ================================ >>
Models: CatB1
Size of training set: 16512 (100%)
Size of test set: 4128


Results for CatBoost:         
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.2325
Test evaluation --> neg_mean_absolute_error: -0.2914
Time elapsed: 6.060s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.2995 ± 0.001
Time elapsed: 31.057s
-------------------------------------------------
Total time: 37.118s


Final results ==================== >>
Duration: 37.120s
-------------------------------------
CatBoost --> neg_mean_absolute_error: -0.2995 ± 0.001 ~

Analyze results¶

In [5]:

            
                Copied!
                
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Out[5]:

		metric_train	metric_test	time_fit	mean_bootstrap	std_bootstrap	time_bootstrap	time
frac	model
0.17	Bag6	-1.716132e-01	-0.425329	0.151s	-0.428422	0.004202	0.782s	0.933s
	CatB6	-1.685300e-01	-0.335640	3.658s	-0.344348	0.001841	17.332s	20.990s
	ET6	-2.506363e-15	-0.385944	0.613s	-0.392057	0.001472	2.845s	3.460s
	LGB6	-2.019156e-01	-0.349968	0.184s	-0.360198	0.003051	0.760s	0.945s
	RF6	-1.491169e-01	-0.399763	1.154s	-0.407387	0.003003	5.683s	6.840s
	Tree6	-3.743775e-17	-0.563991	0.027s	-0.562617	0.019377	0.329s	0.357s
0.33	CatB3	-1.992167e-01	-0.312150	3.881s	-0.322906	0.000789	19.523s	23.405s
	ET3	-2.500448e-15	-0.352708	1.189s	-0.367404	0.001401	5.216s	6.410s
	LGB3	-2.350471e-01	-0.325999	0.237s	-0.335340	0.001817	1.038s	1.275s
1.00	CatB1	-2.325115e-01	-0.291440	6.060s	-0.299480	0.000968	31.057s	37.118s

In [6]:

            
                Copied!
                
# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()

In [7]:

            
                Copied!
                
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])

In [8]:

            
                Copied!
                
# Use the number to call the models from the same run
atom.plot_errors(models="3")
# Use the number to call the models from the same run
atom.plot_errors(models="3")