Successive halving¶

This example shows how to compare multiple tree-based models using successive halving.

Import the california housing dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict house prices.

Load the data¶

In [9]:

            
                Copied!
                
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor

In [10]:

            
                Copied!
                
# Load the data
X, y = fetch_california_housing(return_X_y=True)
# Load the data
X, y = fetch_california_housing(return_X_y=True)

Run the pipeline¶

In [11]:

            
                Copied!
                
atom = ATOMRegressor(X, y, verbose=2, random_state=1)
atom = ATOMRegressor(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (20640, 9)
Scaled: False
Outlier values: 811 (0.5%)
-------------------------------------
Train set size: 16512
Test set size: 4128

In [12]:

            
                Copied!
                
                    
                    
                
                

        
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)


Run: 0 ================================ >>
Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 16512 (17%)
Size of test set: 4128

Training ========================= >>
Metric: neg_mean_absolute_error


Results for Decision Tree:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.564
Time elapsed: 0.031s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.5626 ± 0.0194
Time elapsed: 0.266s
-------------------------------------------------
Total time: 0.297s


Results for Bagging:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1716
Test evaluation --> neg_mean_absolute_error: -0.4253
Time elapsed: 0.141s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.4284 ± 0.0042
Time elapsed: 1.016s
-------------------------------------------------
Total time: 1.156s


Results for Extra-Trees:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.3859
Time elapsed: 0.810s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3921 ± 0.0015
Time elapsed: 3.329s
-------------------------------------------------
Total time: 4.138s


Results for Random Forest:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1491
Test evaluation --> neg_mean_absolute_error: -0.3998
Time elapsed: 1.377s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.4074 ± 0.003
Time elapsed: 6.079s
-------------------------------------------------
Total time: 7.456s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.2019
Test evaluation --> neg_mean_absolute_error: -0.35
Time elapsed: 0.297s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3602 ± 0.0031
Time elapsed: 0.926s
-------------------------------------------------
Total time: 1.223s


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1604
Test evaluation --> neg_mean_absolute_error: -0.3335
Time elapsed: 4.033s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3439 ± 0.0016
Time elapsed: 17.500s
-------------------------------------------------
Total time: 21.533s


Final results ==================== >>
Duration: 35.804s
-------------------------------------
Decision Tree --> neg_mean_absolute_error: -0.5626 ± 0.0194 ~
Bagging       --> neg_mean_absolute_error: -0.4284 ± 0.0042 ~
Extra-Trees   --> neg_mean_absolute_error: -0.3921 ± 0.0015 ~
Random Forest --> neg_mean_absolute_error: -0.4074 ± 0.003 ~
LightGBM      --> neg_mean_absolute_error: -0.3602 ± 0.0031 ~
CatBoost      --> neg_mean_absolute_error: -0.3439 ± 0.0016 ~ !


Run: 1 ================================ >>
Models: ET3, LGB3, CatB3
Size of training set: 16512 (33%)
Size of test set: 4128

Training ========================= >>
Metric: neg_mean_absolute_error


Results for Extra-Trees:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.3527
Time elapsed: 1.129s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3674 ± 0.0014
Time elapsed: 4.874s
-------------------------------------------------
Total time: 6.010s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.235
Test evaluation --> neg_mean_absolute_error: -0.326
Time elapsed: 0.195s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3353 ± 0.0018
Time elapsed: 0.852s
-------------------------------------------------
Total time: 1.048s


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1945
Test evaluation --> neg_mean_absolute_error: -0.314
Time elapsed: 3.538s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.3232 ± 0.0003
Time elapsed: 18.490s
-------------------------------------------------
Total time: 22.027s


Final results ==================== >>
Duration: 29.086s
-------------------------------------
Extra-Trees --> neg_mean_absolute_error: -0.3674 ± 0.0014 ~
LightGBM    --> neg_mean_absolute_error: -0.3353 ± 0.0018 ~
CatBoost    --> neg_mean_absolute_error: -0.3232 ± 0.0003 ~ !


Run: 2 ================================ >>
Models: CatB1
Size of training set: 16512 (100%)
Size of test set: 4128

Training ========================= >>
Metric: neg_mean_absolute_error


Results for CatBoost:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.2317
Test evaluation --> neg_mean_absolute_error: -0.2911
Time elapsed: 5.688s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.2989 ± 0.001
Time elapsed: 26.767s
-------------------------------------------------
Total time: 32.456s


Final results ==================== >>
Duration: 32.456s
-------------------------------------
CatBoost --> neg_mean_absolute_error: -0.2989 ± 0.001 ~

Analyze results¶

In [13]:

            
                Copied!
                
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Out[13]:

		metric_train	metric_test	time_fit	mean_bootstrap	std_bootstrap	time_bootstrap	time
frac	model
0.17	Bag6	-1.716132e-01	-0.425329	0.141s	-0.428422	0.004202	1.016s	1.156s
	CatB6	-2.316594e-01	-0.291079	4.033s	-0.343854	0.001604	17.500s	21.533s
	ET6	-2.500448e-15	-0.352708	0.810s	-0.392057	0.001472	3.329s	4.138s
	LGB6	-2.350471e-01	-0.325999	0.297s	-0.360198	0.003051	0.926s	1.223s
	RF6	-1.491169e-01	-0.399763	1.377s	-0.407387	0.003003	6.079s	7.456s
	Tree6	-3.743775e-17	-0.563991	0.031s	-0.562617	0.019377	0.266s	0.297s
0.33	CatB3	-2.316594e-01	-0.291079	3.538s	-0.323212	0.000260	18.490s	22.027s
	ET3	-2.500448e-15	-0.352708	1.129s	-0.367404	0.001401	4.874s	6.010s
	LGB3	-2.350471e-01	-0.325999	0.195s	-0.335340	0.001817	0.852s	1.048s
1.00	CatB1	-2.316594e-01	-0.291079	5.688s	-0.298891	0.001028	26.767s	32.456s

In [14]:

            
                Copied!
                
# Plot the successive halving's results
atom.plot_successive_halving()
# Plot the successive halving's results
atom.plot_successive_halving()

In [15]:

            
                Copied!
                
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])
# Use an acronym to call all the models with the same estimator
atom.plot_errors(models=["CatB"])

In [16]:

            
                Copied!
                
# Use the number to call the models from the same run
atom.plot_errors(models="3")
# Use the number to call the models from the same run
atom.plot_errors(models="3")