Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Memory: 509.72 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 187 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835

In [4]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_mean_squared_error


Running BO for Linear SVM...
| call             |                    loss |       C |    dual |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squared_epsilon_insen.. | 46.0031 |    True |  0.4704 |  0.4704 |                 -6.505 |                      -6.505 |  0.118s |     0.126s |
| Initial point 2  | squared_epsilon_insen.. |  0.0152 |    True |  0.3273 |  0.4704 |                -6.4239 |                     -6.4239 |  0.038s |     0.386s |
| Initial point 3  |     epsilon_insensitive |  2.2322 |    True |  0.4221 |  0.4704 |                -7.2202 |                     -6.4239 |  0.064s |     0.528s |
| Initial point 4  | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                -5.1322 |                     -5.1322 |  0.039s |     0.643s |
| Iteration 5      |     epsilon_insensitive |  0.0414 |    True |  0.3922 |  0.5469 |                -5.9024 |                     -5.1322 |  0.035s |     0.923s |
| Iteration 6      | squared_epsilon_insen.. |  0.0036 |   False |  0.3266 |  0.5469 |                -7.2191 |                     -5.1322 |  0.037s |     1.235s |
| Iteration 7      | squared_epsilon_insen.. |  0.0368 |   False |  0.5469 |  0.5469 |                -5.1322 |                     -5.1322 |  0.001s |     1.475s |
| Iteration 8      | squared_epsilon_insen.. |  0.0408 |   False |  0.4781 |  0.5469 |                -5.0326 |                     -5.0326 |  0.037s |     1.831s |
| Iteration 9      | squared_epsilon_insen.. |   0.033 |   False |  0.4635 |  0.5469 |                -5.6456 |                     -5.0326 |  0.042s |     2.187s |
| Iteration 10     | squared_epsilon_insen.. |  0.0369 |    True |  0.4594 |  0.5469 |                -5.6458 |                     -5.0326 |  0.040s |     2.719s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False}
Best evaluation --> r2: 0.5469   neg_mean_squared_error: -5.1322
Time elapsed: 3.015s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4648   neg_mean_squared_error: -5.6414
Test evaluation --> r2: 0.4328   neg_mean_squared_error: -5.5575
Time elapsed: 0.018s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4318 ± 0.0046   neg_mean_squared_error: -5.5675 ± 0.0452
Time elapsed: 0.094s
-------------------------------------------------
Total time: 3.127s


Running BO for HistGBM...
| call             |        loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  |     poisson |         0.733 |       73 |             50 |         4 |               18 |               0.4 |  0.5137 |  0.5137 |                -5.9736 |                     -5.9736 |  0.155s |     0.169s |
| Initial point 2  | absolute_.. |        0.7432 |      425 |             23 |         8 |               19 |               0.2 |  0.4734 |  0.5137 |                -5.0286 |                     -5.0286 |  1.677s |     1.945s |
| Initial point 3  | absolute_.. |        0.6729 |      234 |             27 |        15 |               26 |               0.7 |  0.4905 |  0.5137 |                -6.3654 |                     -5.0286 |  1.645s |     3.692s |
| Initial point 4  |     poisson |        0.0153 |      264 |             45 |        13 |               27 |               0.3 |  0.6013 |  0.6013 |                -4.5161 |                     -4.5161 |  1.748s |     5.547s |
| Iteration 5      |     poisson |          0.01 |       28 |             50 |        12 |               15 |               0.6 |  0.2368 |  0.6013 |                -7.4112 |                     -4.5161 |  0.248s |     6.132s |
| Iteration 6      |     poisson |        0.0274 |      210 |             45 |         9 |               28 |               0.0 |  0.5329 |  0.6013 |                -5.0078 |                     -4.5161 |  1.057s |     7.565s |
| Iteration 7      | squared_e.. |          0.01 |      453 |             17 |        11 |               25 |               0.4 |  0.5496 |  0.6013 |                -5.0927 |                     -4.5161 |  2.402s |    10.570s |
| Iteration 8      | absolute_.. |        0.0102 |      333 |             34 |        14 |               30 |               0.7 |  0.5373 |  0.6013 |                -4.4619 |                     -4.4619 |  2.312s |    13.269s |
| Iteration 9      |     poisson |        0.0138 |      147 |             50 |        16 |               27 |               0.2 |  0.5534 |  0.6013 |                -4.6996 |                     -4.4619 |  1.132s |    14.799s |
| Iteration 10     |     poisson |           1.0 |       36 |             26 |         6 |               10 |               0.9 |  0.3135 |  0.6013 |                -7.1688 |                     -4.4619 |  0.155s |    15.258s |
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 13, 'min_samples_leaf': 27, 'l2_regularization': 0.3}
Best evaluation --> r2: 0.6013   neg_mean_squared_error: -4.5161
Time elapsed: 15.797s
Fit ---------------------------------------------
Train evaluation --> r2: 0.7301   neg_mean_squared_error: -2.845
Test evaluation --> r2: 0.5224   neg_mean_squared_error: -4.6795
Time elapsed: 2.484s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5064 ± 0.0163   neg_mean_squared_error: -4.8365 ± 0.16
Time elapsed: 11.562s
-------------------------------------------------
Total time: 29.844s


Final results ==================== >>
Duration: 32.971s
-------------------------------------
Linear SVM --> r2: 0.4318 ± 0.0046   neg_mean_squared_error: -5.5675 ± 0.0452
HistGBM    --> r2: 0.5064 ± 0.0163   neg_mean_squared_error: -4.8365 ± 0.16 ~ !

Analyze the results¶

In [6]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[6]:

	metric_bo	metric_train	metric_test
lSVM	[0.5468896602938744, -5.13219509627928]	[0.4648239922283617, -5.6414015824978]	[0.43283219743814105, -5.5574871296346195]
hGBM	[0.6012801528863336, -4.516136280344341]	[0.7301031935292892, -2.845038359388989]	[0.5224380449509453, -4.67946947411353]

In [7]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")

In [8]:

            
                Copied!
                
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")