# Example: Multi-metric runs
----------------------------

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from [sklearn.datasets](https://scikit-learn.org/stable/datasets/index.html#wine-dataset). This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

## Load the data

In [1]:
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


## Run the pipeline

In [3]:
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)


Algorithm task: Regression.

Shape: (4177, 9)
Train set size: 3342
Test set size: 835
-------------------------------------
Memory: 300.88 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 189 (0.6%)



In [4]:
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.


In [5]:
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)


Models: lSVM, hGBM
Metric: r2, rmse


Running hyperparameter tuning for LinearSVM...
| trial |                    loss |       C |    dual |      r2 | best_r2 |    rmse | best_rmse | time_trial | time_ht |    state |
| ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     | squared_epsilon_insen.. |   0.001 |    True |  0.2887 |  0.2887 | -2.6528 |   -2.6528 |     0.054s |  0.054s | COMPLETE |
| 1     | squared_epsilon_insen.. |  0.0534 |   False |  0.4507 |  0.4507 | -2.3314 |   -2.3314 |     0.047s |  0.101s | COMPLETE |
| 2     | squared_epsilon_insen.. |  0.0105 |    True |   0.451 |   0.451 | -2.3307 |   -2.3307 |     0.056s |  0.157s | COMPLETE |
| 3     |     epsilon_insensitive |  0.6215 |    True |  0.4266 |   0.451 | -2.3818 |   -2.3307 |     0.059s |  0.216s | COMPLETE |
| 4     | squared_epsilon_insen.. |  0.0369 |   False |  0.4509 |   0.451 | -2.3308 |   -2.3307 |     0.049s |  0.265s | C

In [6]:
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()

Applying cross-validation...


Unnamed: 0,train_r2,test_r2,train_rmse,test_rmse,time
0,0.719472,0.535855,-1.704209,-2.211607,0.135122
1,0.72676,0.539374,-1.682744,-2.198686,0.135123
2,0.723168,0.515653,-1.71659,-2.132145,0.14313
3,0.701827,0.596559,-1.73811,-2.148098,0.116105
4,0.718167,0.542704,-1.717794,-2.147347,0.148133
mean,0.717879,0.546029,-1.711889,-2.167577,0.135523


## Analyze the results

In [7]:
# The columns in the results dataframe contain one for each metric
atom.results.data[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]

Unnamed: 0,r2_ht,r2_train,r2_test,rmse_ht,rmse_train,rmse_test
lSVM,0.451203,0.458,0.4565,-2.330239,-2.3823,-2.3411
hGBM,0.556021,0.7182,0.5427,-2.095926,-1.7178,-2.1473


In [8]:
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")

In [9]:
atom.plot_results(metric="r2")