Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Memory: 509.72 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 187 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835

In [4]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_mean_squared_error


Running BO for Linear-SVM...
| call             |                    loss |       C |    dual |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squared_epsilon_insen.. | 46.0031 |    True |  0.4704 |  0.4704 |                 -6.505 |                      -6.505 |  0.119s |     0.127s |
| Initial point 2  | squared_epsilon_insen.. |  0.0152 |    True |  0.3273 |  0.4704 |                -6.4239 |                     -6.4239 |  0.047s |     0.395s |
| Initial point 3  |     epsilon_insensitive |  2.2322 |    True |  0.4221 |  0.4704 |                -7.2202 |                     -6.4239 |  0.064s |     0.535s |
| Initial point 4  | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                -5.1322 |                     -5.1322 |  0.038s |     0.650s |
| Iteration 5      |     epsilon_insensitive |  0.0414 |    True |  0.3922 |  0.5469 |                -5.9024 |                     -5.1322 |  0.035s |     0.917s |
| Iteration 6      | squared_epsilon_insen.. |  0.0036 |   False |  0.3266 |  0.5469 |                -7.2191 |                     -5.1322 |  0.038s |     1.230s |
| Iteration 7      | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                -5.1322 |                     -5.1322 |  0.000s |     1.459s |
| Iteration 8      | squared_epsilon_insen.. |  0.0408 |   False |  0.4781 |  0.5469 |                -5.0326 |                     -5.0326 |  0.036s |     1.809s |
| Iteration 9      | squared_epsilon_insen.. |   0.033 |   False |  0.4635 |  0.5469 |                -5.6456 |                     -5.0326 |  0.037s |     2.173s |
| Iteration 10     | squared_epsilon_insen.. |  0.0369 |    True |  0.4594 |  0.5469 |                -5.6458 |                     -5.0326 |  0.042s |     2.623s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False}
Best evaluation --> r2: 0.5469   neg_mean_squared_error: -5.1322
Time elapsed: 2.924s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4648   neg_mean_squared_error: -5.6414
Test evaluation --> r2: 0.4328   neg_mean_squared_error: -5.5575
Time elapsed: 0.018s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4318 ± 0.0046   neg_mean_squared_error: -5.5675 ± 0.0452
Time elapsed: 0.090s
-------------------------------------------------
Total time: 3.032s


Running BO for HistGBM...
| call             |        loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  |     poisson |         0.733 |       73 |             50 |         2 |               18 |               0.4 |  0.5773 |  0.5773 |                -5.1926 |                     -5.1926 |  0.089s |     0.102s |
| Initial point 2  | absolute_.. |        0.7432 |      425 |             23 |         5 |               19 |               0.2 |  0.4857 |  0.5773 |                -4.9114 |                     -4.9114 |  0.920s |     1.129s |
| Initial point 3  | absolute_.. |        0.6729 |      234 |             27 |         9 |               26 |               0.7 |  0.4833 |  0.5773 |                -6.4547 |                     -4.9114 |  1.007s |     2.240s |
| Initial point 4  |     poisson |        0.0153 |      264 |             45 |         8 |               27 |               0.3 |  0.6013 |  0.6013 |                -4.5159 |                     -4.5159 |  1.594s |     3.934s |
| Iteration 5      | squared_e.. |        0.0719 |      410 |             34 |         6 |               29 |               0.1 |  0.4733 |  0.6013 |                -5.1143 |                     -4.5159 |  0.948s |     5.175s |
| Iteration 6      |     poisson |          0.01 |      500 |             10 |      None |               10 |               1.0 |  0.5466 |  0.6013 |                -4.8608 |                     -4.5159 |  1.012s |     6.688s |
| Iteration 7      |     poisson |          0.01 |      201 |             50 |         8 |               24 |               1.0 |  0.5438 |  0.6013 |                -5.1586 |                     -4.5159 |  1.639s |     8.761s |
| Iteration 8      |     poisson |          0.01 |      500 |             10 |      None |               10 |               0.0 |  0.5378 |  0.6013 |                -4.4574 |                     -4.4574 |  0.969s |    10.346s |
| Iteration 9      |     poisson |        0.0106 |      324 |             42 |         9 |               30 |               0.0 |  0.5672 |  0.6013 |                -4.5537 |                     -4.4574 |  2.739s |    13.464s |
| Iteration 10     |     poisson |           1.0 |      500 |             50 |      None |               10 |               0.0 |  0.1422 |  0.6013 |                -8.9582 |                     -4.4574 |  3.951s |    17.772s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 8, 'min_samples_leaf': 27, 'l2_regularization': 0.3}
Best evaluation --> r2: 0.6013   neg_mean_squared_error: -4.5159
Time elapsed: 18.180s
Fit ---------------------------------------------
Train evaluation --> r2: 0.7132   neg_mean_squared_error: -3.023
Test evaluation --> r2: 0.5224   neg_mean_squared_error: -4.6794
Time elapsed: 1.915s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.509 ± 0.014   neg_mean_squared_error: -4.8115 ± 0.1372
Time elapsed: 9.698s
-------------------------------------------------
Total time: 29.795s


Final results ==================== >>
Duration: 32.827s
-------------------------------------
Linear-SVM --> r2: 0.4318 ± 0.0046   neg_mean_squared_error: -5.5675 ± 0.0452
HistGBM    --> r2: 0.509 ± 0.014   neg_mean_squared_error: -4.8115 ± 0.1372 ~ !

Analyze the results¶

In [6]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[6]:

	metric_bo	metric_train	metric_test
lSVM	[0.5468896602938744, -5.13219509627928]	[0.4648239922283617, -5.6414015824978]	[0.43283219743814105, -5.5574871296346195]
hGBM	[0.6012979195596735, -4.515935044517652]	[0.7132250948243765, -3.0229539074717753]	[0.522445526933692, -4.679396160673255]

In [7]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")

In [8]:

            
                Copied!
                
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")