# Example: Successive halving
-----------------------------

This example shows how to compare multiple tree-based models using successive halving.

Import the california housing dataset from [sklearn.datasets](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html).
 This is a small and easy to train dataset whose goal is to predict house prices.

## Load the data

In [1]:
from sklearn.datasets import fetch_california_housing
from atom import ATOMRegressor



In [2]:
# Load the data
X, y = fetch_california_housing(return_X_y=True)

## Run the pipeline

In [3]:
atom = ATOMRegressor(X, y, verbose=2, random_state=1)

Algorithm task: regression.

Shape: (20640, 9)
Train set size: 16512
Test set size: 4128
-------------------------------------
Memory: 1.49 MB
Scaled: False
Outlier values: 786 (0.5%)



In [4]:
# Compare tree-based models via successive halving
atom.successive_halving(
    models=["Tree", "Bag", "ET", "RF", "LGB", "CatB"],
    metric="mae",
    n_bootstrap=5,
)


Metric: neg_mean_absolute_error


Models: Tree6, Bag6, ET6, RF6, LGB6, CatB6
Size of training set: 16512 (17%)
Size of test set: 4128


Results for DecisionTree:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.0
Test evaluation --> neg_mean_absolute_error: -0.5538
Time elapsed: 0.038s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.5542 ± 0.0284
Time elapsed: 0.279s
-------------------------------------------------
Total time: 0.317s


Results for Bagging:
Fit ---------------------------------------------
Train evaluation --> neg_mean_absolute_error: -0.1648
Test evaluation --> neg_mean_absolute_error: -0.42
Time elapsed: 0.213s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_absolute_error: -0.4289 ± 0.0055
Time elapsed: 1.064s
-------------------------------------------------
Total time: 1.278s


Results for ExtraTrees:
Fit ---------------------------------------

## Analyze the results

In [5]:
# The results is now multi-index, where frac is the fraction
# of the training set used to fit the model. The model names
# end with the number of models fitted during that run
atom.results

Unnamed: 0_level_0,Unnamed: 1_level_0,score_train,score_test,time_fit,score_bootstrap,time_bootstrap,time
frac,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0.17,Bag6,-0.1648,-0.42,0.213194,-0.428947,1.06439,1.277584
0.17,CatB6,-0.1591,-0.3433,4.4757,-0.348576,20.322169,24.797869
0.17,ET6,-0.0,-0.3958,0.740051,-0.40153,3.456043,4.196094
0.17,LGB6,-0.2022,-0.3599,0.308791,-0.360678,0.744363,1.053154
0.17,RF6,-0.145,-0.4003,2.033849,-0.405193,9.483478,11.517327
0.17,Tree6,-0.0,-0.5538,0.038035,-0.554181,0.278868,0.316903
0.33,CatB3,-0.1876,-0.3176,4.643221,-0.329575,21.53707,26.180291
0.33,ET3,-0.0,-0.3691,1.422265,-0.382764,6.06718,7.489445
0.33,LGB3,-0.2367,-0.3342,0.344359,-0.345083,0.915865,1.260224
1.0,CatB1,-0.2229,-0.2986,6.611475,-0.309112,32.723607,39.335082


In [6]:
# Plot the successive halving's results
atom.plot_successive_halving()

In [7]:
# Use regex to call all the models with the same estimator...
atom.plot_errors(models=["CatB.*"])

In [8]:
# ...or to call the models from the same run
atom.plot_errors(models=".*3")