Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [9]:

            
                Copied!
                
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [10]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[10]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [11]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 182 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835

In [12]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [13]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_mean_squared_error


Running BO for Linear-SVM...
| call             |                    loss |       C |    dual |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squared_epsilon_insen.. | 46.0031 |    True |  0.4003 |  0.4003 |                -7.2241 |                     -7.2241 |  0.109s |     0.125s |
| Initial point 2  | squared_epsilon_insen.. |  0.0152 |    True |   0.398 |  0.4003 |                -6.4967 |                     -6.4967 |  0.031s |     0.313s |
| Initial point 3  |     epsilon_insensitive |  2.2322 |    True |  0.4425 |  0.4425 |                -6.0495 |                     -6.0495 |  0.094s |     0.484s |
| Initial point 4  | squared_epsilon_insen.. |  0.0368 |   False |   0.445 |   0.445 |                -5.9247 |                     -5.9247 |  0.031s |     0.594s |
| Iteration 5      | squared_epsilon_insen.. | 99.8148 |   False |  0.4315 |   0.445 |                -5.7147 |                     -5.7147 |  0.031s |     0.875s |
| Iteration 6      |     epsilon_insensitive |   0.001 |    True | -4.8849 |   0.445 |               -64.4077 |                     -5.7147 |  0.031s |     1.250s |
| Iteration 7      |     epsilon_insensitive |  3.3191 |    True |  0.4124 |   0.445 |                -6.6678 |                     -5.7147 |  0.063s |     1.875s |
| Iteration 8      | squared_epsilon_insen.. |  0.0718 |   False |  0.3431 |   0.445 |                -6.5193 |                     -5.7147 |  0.031s |     2.235s |
| Iteration 9      | squared_epsilon_insen.. |  0.0278 |   False |  0.4614 |  0.4614 |                -5.1863 |                     -5.1863 |  0.047s |     2.594s |
| Iteration 10     |     epsilon_insensitive |   100.0 |    True |  0.3695 |  0.4614 |                -7.5606 |                     -5.1863 |  0.156s |     2.985s |
Bayesian Optimization ---------------------------
Best call --> Iteration 9
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0278, 'dual': False}
Best evaluation --> r2: 0.4614   neg_mean_squared_error: -5.1863
Time elapsed: 3.349s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4595   neg_mean_squared_error: -5.7024
Test evaluation --> r2: 0.453   neg_mean_squared_error: -5.3403
Time elapsed: 0.016s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4498 ± 0.0036   neg_mean_squared_error: -5.371 ± 0.0352
Time elapsed: 0.094s
-------------------------------------------------
Total time: 3.459s


Running BO for HistGBM...
| call             |        loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  |     poisson |         0.733 |       73 |             50 |         2 |               18 |               0.4 |  0.5283 |  0.5283 |                -5.6814 |                     -5.6814 |  0.078s |     0.094s |
| Initial point 2  | absolute_.. |        0.7432 |      425 |             23 |         5 |               19 |               0.2 |  0.4851 |  0.5283 |                 -5.557 |                      -5.557 |  1.605s |     1.792s |
| Initial point 3  | absolute_.. |        0.6729 |      234 |             27 |         9 |               26 |               0.7 |  0.4877 |  0.5283 |                -5.5583 |                      -5.557 |  1.625s |     3.527s |
| Initial point 4  |     poisson |        0.0153 |      264 |             45 |         8 |               27 |               0.3 |  0.5336 |  0.5336 |                -4.9791 |                     -4.9791 |  2.411s |     6.047s |
| Iteration 5      | squared_e.. |        0.0719 |      410 |             34 |         6 |               29 |               0.1 |  0.5085 |  0.5336 |                -4.9404 |                     -4.9404 |  1.359s |     7.704s |
| Iteration 6      |     poisson |        0.0102 |      467 |             11 |         1 |               21 |               0.9 |  0.4046 |  0.5336 |                -6.5166 |                     -4.9404 |  0.203s |     8.219s |
| Iteration 7      |     poisson |           1.0 |       10 |             50 |        10 |               10 |               0.0 |  0.3163 |  0.5336 |                -7.7589 |                     -4.9404 |  0.094s |     8.711s |
| Iteration 8      | squared_e.. |          0.01 |       61 |             13 |        10 |               16 |               0.2 |  0.3426 |  0.5336 |                 -6.525 |                     -4.9404 |  0.156s |     9.273s |
| Iteration 9      |     poisson |          0.01 |      289 |             50 |         7 |               28 |               0.0 |  0.5531 |  0.5531 |                -4.3032 |                     -4.3032 |  2.141s |    11.836s |
| Iteration 10     | squared_e.. |          0.01 |       10 |             10 |      None |               30 |               0.0 |  0.0745 |  0.5531 |               -11.0979 |                     -4.3032 |  0.047s |    12.305s |
Bayesian Optimization ---------------------------
Best call --> Iteration 9
Best parameters --> {'loss': 'poisson', 'learning_rate': 0.01, 'max_iter': 289, 'max_leaf_nodes': 50, 'max_depth': 7, 'min_samples_leaf': 28, 'l2_regularization': 0.0}
Best evaluation --> r2: 0.5531   neg_mean_squared_error: -4.3032
Time elapsed: 12.852s
Fit ---------------------------------------------
Train evaluation --> r2: 0.6662   neg_mean_squared_error: -3.5218
Test evaluation --> r2: 0.5726   neg_mean_squared_error: -4.1724
Time elapsed: 2.089s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5538 ± 0.0083   neg_mean_squared_error: -4.3561 ± 0.0806
Time elapsed: 10.323s
-------------------------------------------------
Total time: 25.264s


Final results ==================== >>
Duration: 28.722s
-------------------------------------
Linear-SVM --> r2: 0.4498 ± 0.0036   neg_mean_squared_error: -5.371 ± 0.0352
HistGBM    --> r2: 0.5538 ± 0.0083   neg_mean_squared_error: -4.3561 ± 0.0806 !

Analyze the results¶

In [14]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[14]:

	metric_bo	metric_train	metric_test
lSVM	[0.4613660476456929, -5.186254956043416]	[0.4594925871132435, -5.702426270752693]	[0.4529905982859944, -5.340344775710873]
hGBM	[0.553083077887542, -4.3031544746009]	[0.6661886024220871, -3.521755368457413]	[0.5726269450224242, -4.172358746079403]

In [15]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")

In [16]:

            
                Copied!
                
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")