Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Memory: 509.72 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 187 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835

In [4]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_root_mean_squared_error


Running BO for Linear SVM...
| call             |                    loss |       C |    dual |      r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error |    time | total_time |
| ---------------- | ----------------------- | ------- | ------- | ------- | ------- | --------------------------- | -------------------------------- | ------- | ---------- |
| Initial point 1  | squared_epsilon_insen.. | 46.0031 |    True |  0.4704 |  0.4704 |                     -2.5505 |                          -2.5505 |  0.117s |     0.132s |
| Initial point 2  | squared_epsilon_insen.. |  0.0152 |    True |  0.3273 |  0.4704 |                     -2.5345 |                          -2.5345 |  0.031s |     0.163s |
| Initial point 3  |     epsilon_insensitive |  2.2322 |    True |  0.4221 |  0.4704 |                      -2.687 |                          -2.5345 |  0.078s |     0.242s |
| Initial point 4  | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                     -2.2654 |                          -2.2654 |  0.031s |     0.273s |
| Iteration 5      |     epsilon_insensitive |  0.0414 |    True |  0.3922 |  0.5469 |                     -2.4295 |                          -2.2654 |  0.031s |     0.476s |
| Iteration 6      | squared_epsilon_insen.. |  0.0036 |   False |  0.3266 |  0.5469 |                     -2.6868 |                          -2.2654 |  0.047s |     0.726s |
| Iteration 7      | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                     -2.2654 |                          -2.2654 |  0.000s |     0.898s |
| Iteration 8      | squared_epsilon_insen.. |  0.0408 |   False |  0.4781 |  0.5469 |                     -2.2434 |                          -2.2434 |  0.047s |     1.210s |
| Iteration 9      | squared_epsilon_insen.. |   0.033 |   False |  0.4635 |  0.5469 |                      -2.376 |                          -2.2434 |  0.047s |     1.585s |
| Iteration 10     | squared_epsilon_insen.. |  0.0369 |    True |  0.4594 |  0.5469 |                     -2.3761 |                          -2.2434 |  0.047s |     1.960s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False}
Best evaluation --> r2: 0.5469   neg_root_mean_squared_error: -2.2654
Time elapsed: 2.179s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4648   neg_root_mean_squared_error: -2.3752
Test evaluation --> r2: 0.4328   neg_root_mean_squared_error: -2.3574
Time elapsed: 0.016s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4318 ± 0.0046   neg_root_mean_squared_error: -2.3595 ± 0.0096
Time elapsed: 0.094s
-------------------------------------------------
Total time: 2.289s


Running BO for HistGBM...
| call             |        loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error |    time | total_time |
| ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | --------------------------- | -------------------------------- | ------- | ---------- |
| Initial point 1  |     poisson |         0.733 |       73 |             50 |         4 |               18 |               0.4 |  0.5137 |  0.5137 |                     -2.4441 |                          -2.4441 |  0.141s |     0.156s |
| Initial point 2  | absolute_.. |        0.7432 |      425 |             23 |         8 |               19 |               0.2 |  0.4734 |  0.5137 |                     -2.2425 |                          -2.2425 |  1.656s |     1.828s |
| Initial point 3  | absolute_.. |        0.6729 |      234 |             27 |        15 |               26 |               0.7 |  0.4905 |  0.5137 |                      -2.523 |                          -2.2425 |  1.281s |     3.110s |
| Initial point 4  |     poisson |        0.0153 |      264 |             45 |        13 |               27 |               0.3 |  0.6013 |  0.6013 |                     -2.1251 |                          -2.1251 |  1.781s |     4.891s |
| Iteration 5      |     poisson |          0.01 |       28 |             50 |        12 |               15 |               0.6 |  0.2368 |  0.6013 |                     -2.7223 |                          -2.1251 |  0.250s |     5.407s |
| Iteration 6      |     poisson |        0.0274 |      210 |             45 |         9 |               28 |               0.0 |  0.5329 |  0.6013 |                     -2.2378 |                          -2.1251 |  1.063s |     6.782s |
| Iteration 7      | squared_e.. |          0.01 |      453 |             17 |        11 |               25 |               0.4 |  0.5496 |  0.6013 |                     -2.2567 |                          -2.1251 |  1.156s |     8.235s |
| Iteration 8      | absolute_.. |        0.0102 |      333 |             34 |        14 |               30 |               0.7 |  0.5373 |  0.6013 |                     -2.1123 |                          -2.1123 |  2.078s |    10.626s |
| Iteration 9      |     poisson |        0.0138 |      147 |             50 |        16 |               27 |               0.2 |  0.5534 |  0.6013 |                     -2.1679 |                          -2.1123 |  1.000s |    11.938s |
| Iteration 10     |     poisson |           1.0 |       36 |             26 |         6 |               10 |               0.9 |  0.3135 |  0.6013 |                     -2.6775 |                          -2.1123 |  0.141s |    12.298s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 13, 'min_samples_leaf': 27, 'l2_regularization': 0.3}
Best evaluation --> r2: 0.6013   neg_root_mean_squared_error: -2.1251
Time elapsed: 12.720s
Fit ---------------------------------------------
Train evaluation --> r2: 0.7301   neg_root_mean_squared_error: -1.6867
Test evaluation --> r2: 0.5224   neg_root_mean_squared_error: -2.1632
Time elapsed: 1.609s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5064 ± 0.0163   neg_root_mean_squared_error: -2.1989 ± 0.0364
Time elapsed: 10.424s
-------------------------------------------------
Total time: 24.753s


Final results ==================== >>
Duration: 27.042s
-------------------------------------
Linear SVM --> r2: 0.4318 ± 0.0046   neg_root_mean_squared_error: -2.3595 ± 0.0096
HistGBM    --> r2: 0.5064 ± 0.0163   neg_root_mean_squared_error: -2.1989 ± 0.0364 ~ !

In [6]:

            
                Copied!
                
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()

Applying cross-validation...

Out[6]:

	train_r2	test_r2	train_neg_root_mean_squared_error	test_neg_root_mean_squared_error	time (s)
0	0.727249	0.545513	-1.675553	-2.214354	1.739589
1	0.719495	0.569580	-1.695233	-2.173604	1.797012
2	0.732069	0.514090	-1.663940	-2.272483	1.640750
3	0.719680	0.581855	-1.719713	-2.020461	1.616995
4	0.730103	0.522438	-1.686724	-2.163208	1.656376
mean	0.725719	0.546695	-1.688233	-2.168822	1.690144
std	0.005236	0.026126	0.018938	0.083527	0.067522

Analyze the results¶

In [7]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[7]:

	metric_bo	metric_train	metric_test
lSVM	[0.5468896602938744, -2.265434858096626]	[0.4648239922283617, -2.37516348542533]	[0.43283219743814105, -2.3574323170845477]
hGBM	[0.6012801528863336, -2.125120297852416]	[0.7301031935292892, -1.686724150354464]	[0.5224380449509453, -2.1632081439643134]

In [8]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="rmse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="rmse", title="BO performance for Mean Squared Error")

In [9]:

            
                Copied!
                
atom.plot_results(metric="rmse")
atom.plot_results(metric="rmse")