# Example: Successive halving
-----------------------------

This example shows how to compare multiple tree-based models using successive halving.

Import the california housing dataset from [sklearn.datasets](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html).
 This is a small and easy to train dataset whose goal is to predict house prices.

## Load the data

In [1]:
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor

In [2]:
# Load the data
X, y = fetch_california_housing(return_X_y=True)

## Run the pipeline

In [3]:
atom = ATOMRegressor(X, y, verbose=2, random_state=1)


Algorithm task: Regression.

Shape: (20640, 9)
Train set size: 16512
Test set size: 4128
-------------------------------------
Memory: 1.49 MB
Scaled: False
Outlier values: 786 (0.5%)



In [4]:
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)


Metric: mae


Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 16512 (17%)
Size of test set: 4128


Results for DecisionTree:
Fit ---------------------------------------------
Train evaluation --> mae: -0.0
Test evaluation --> mae: -0.5394
Time elapsed: 0.110s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.576 ± 0.0119
Time elapsed: 0.361s
-------------------------------------------------
Time: 0.472s


Results for Bagging:
Fit ---------------------------------------------
Train evaluation --> mae: -0.1715
Test evaluation --> mae: -0.4308
Time elapsed: 0.302s
Bootstrap ---------------------------------------
Evaluation --> mae: -0.435 ± 0.0059
Time elapsed: 1.129s
-------------------------------------------------
Time: 1.431s


Results for ExtraTrees:
Fit ---------------------------------------------
Train evaluation --> mae: -0.0
Test evaluation --> mae: -0.3977
Time elapsed: 0.980s
Bootstrap ---------------------------------------
Evaluation

## Analyze the results

In [5]:
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Unnamed: 0_level_0,Unnamed: 1_level_0,mae_train,mae_test,time_fit,mae_bootstrap,time_bootstrap,time
frac,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0.17,Bag6,-0.2017,-0.4327,0.302116,-0.434981,1.129138,1.431254
0.17,CatB6,-0.2043,-0.356,6.83745,-0.354266,73.713478,80.550928
0.17,ET6,-0.0694,-0.4077,0.979613,-0.405855,4.263598,5.243211
0.17,LGB6,-0.2202,-0.3676,0.430176,-0.367271,0.788259,1.218435
0.17,RF6,-0.1851,-0.4165,2.36208,-0.416217,10.267119,12.629199
0.17,Tree6,-0.1039,-0.5897,0.110449,-0.575962,0.361372,0.471821
0.33,CatB3,-0.2249,-0.3341,5.94356,-0.33367,27.655384,33.598944
0.33,ET3,-0.0935,-0.3879,1.755022,-0.384081,6.913167,8.668189
0.33,LGB3,-0.2489,-0.3405,0.48379,-0.344951,0.941663,1.425453
1.0,CatB1,-0.2231,-0.3008,8.146296,-0.307908,40.904606,49.050902


In [6]:
# Plot the successive halving's results
atom.plot_successive_halving()

In [7]:
# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])

In [8]:
# ...or to call the models from the same run
atom.plot_errors(models=".*3")