Example: Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

Copied!

# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:

Copied!

# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

Copied!

atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Train set size: 3342
Test set size: 835
-------------------------------------
Memory: 300.88 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 189 (0.6%)

In [4]:

Copied!

atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

Copied!





# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, rmse


Running hyperparameter tuning for LinearSVM...
| trial |                    loss |       C |    dual |      r2 | best_r2 |    rmse | best_rmse | time_trial | time_ht |    state |
| ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     | squared_epsilon_insen.. |   0.001 |    True |  0.2887 |  0.2887 | -2.6528 |   -2.6528 |     0.081s |  0.081s | COMPLETE |
| 1     | squared_epsilon_insen.. |  0.0534 |   False |  0.3862 |  0.3862 | -2.5926 |   -2.5926 |     0.078s |  0.159s | COMPLETE |
| 2     | squared_epsilon_insen.. |  0.0105 |    True |   0.433 |   0.433 | -2.4084 |   -2.4084 |     0.070s |  0.229s | COMPLETE |
| 3     |     epsilon_insensitive |  0.6215 |    True |  0.4022 |   0.433 | -2.5251 |   -2.4084 |     0.064s |  0.293s | COMPLETE |
| 4     | squared_epsilon_insen.. |  0.0369 |   False |  0.4057 |   0.433 | -2.5477 |   -2.4084 |     0.058s |  0.351s | COMPLETE |
| 5     |     epsilon_insensitive |  0.0016 |    True | -1.5344 |   0.433 | -5.0102 |   -2.4084 |     0.052s |  0.403s | COMPLETE |
| 6     | squared_epsilon_insen.. | 61.5811 |   False |  0.4354 |  0.4354 | -2.3845 |   -2.3845 |     0.047s |  0.450s | COMPLETE |
| 7     | squared_epsilon_insen.. |  14.898 |   False |  0.4925 |  0.4925 | -2.2628 |   -2.2628 |     0.050s |  0.500s | COMPLETE |
| 8     |     epsilon_insensitive |  0.0252 |    True |  0.3695 |  0.4925 | -2.6178 |   -2.2628 |     0.048s |  0.548s | COMPLETE |
| 9     | squared_epsilon_insen.. |  0.0294 |    True |  0.4767 |  0.4925 | -2.3896 |   -2.2628 |     0.055s |  0.604s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 7
Best parameters:
 --> loss: squared_epsilon_insensitive
 --> C: 14.898
 --> dual: False
Best evaluation --> r2: 0.4925   rmse: -2.2628
Time elapsed: 0.604s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4592   rmse: -2.3795
Test evaluation --> r2: 0.4584   rmse: -2.3369
Time elapsed: 0.112s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4577 ± 0.002   rmse: -2.3384 ± 0.0043
Time elapsed: 0.131s
-------------------------------------------------
Time: 0.847s


Running hyperparameter tuning for HistGradientBoosting...
| trial |      loss | quantile | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 |    rmse | best_rmse | time_trial | time_ht |    state |
| ----- | --------- | -------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     | absolut.. |      0.1 |        0.0236 |      180 |             26 |        12 |               11 |               0.0 |  0.5373 |  0.5373 | -2.1398 |   -2.1398 |     0.933s |  0.933s | COMPLETE |
| 1     |     gamma |      0.5 |         0.242 |      160 |             38 |         3 |               20 |               0.0 |   0.574 |   0.574 | -2.1598 |   -2.1398 |     0.166s |  1.099s | COMPLETE |
| 2     |  quantile |      0.4 |        0.2448 |      210 |             12 |         3 |               25 |               0.3 |  0.4714 |   0.574 | -2.3253 |   -2.1398 |     0.474s |  1.573s | COMPLETE |
| 3     |  quantile |      0.6 |         0.017 |      480 |             28 |        16 |               13 |               0.1 |  0.5712 |   0.574 | -2.1385 |   -2.1385 |     3.667s |  5.241s | COMPLETE |
| 4     | squared.. |      1.0 |        0.2649 |       70 |             10 |        10 |               28 |               0.8 |  0.5561 |   0.574 | -2.2019 |   -2.1385 |     0.167s |  5.408s | COMPLETE |
| 5     | squared.. |      0.1 |        0.0283 |      360 |             32 |         9 |               11 |               0.5 |  0.5464 |   0.574 | -2.1197 |   -2.1197 |     1.312s |  6.720s | COMPLETE |
| 6     |  quantile |      0.4 |        0.1264 |      380 |             37 |        12 |               29 |               1.0 |  0.4417 |   0.574 | -2.3712 |   -2.1197 |     3.378s | 10.098s | COMPLETE |
| 7     |     gamma |      0.6 |         0.678 |      330 |             25 |         6 |               12 |               0.8 |  0.4299 |   0.574 | -2.3984 |   -2.1197 |     0.807s | 10.904s | COMPLETE |
| 8     | absolut.. |      0.9 |        0.0831 |      280 |             42 |        16 |               10 |               1.0 |  0.5242 |   0.574 | -2.2742 |   -2.1197 |     2.341s | 13.246s | COMPLETE |
| 9     | absolut.. |      0.6 |        0.0373 |      300 |             40 |        13 |               17 |               0.8 |  0.5685 |   0.574 |   -2.17 |   -2.1197 |     2.032s | 15.277s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 5
Best parameters:
 --> loss: squared_error
 --> quantile: 0.1
 --> learning_rate: 0.0283
 --> max_iter: 360
 --> max_leaf_nodes: 32
 --> max_depth: 9
 --> min_samples_leaf: 11
 --> l2_regularization: 0.5
Best evaluation --> r2: 0.5464   rmse: -2.1197
Time elapsed: 15.277s
Fit ---------------------------------------------
Train evaluation --> r2: 0.7959   rmse: -1.4619
Test evaluation --> r2: 0.5479   rmse: -2.135
Time elapsed: 1.311s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5259 ± 0.0154   rmse: -2.1862 ± 0.0353
Time elapsed: 4.073s
-------------------------------------------------
Time: 20.661s


Final results ==================== >>
Total time: 22.907s
-------------------------------------
LinearSVM            --> r2: 0.4577 ± 0.002   rmse: -2.3384 ± 0.0043
HistGradientBoosting --> r2: 0.5259 ± 0.0154   rmse: -2.1862 ± 0.0353 ~ !

In [6]:

Copied!

# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()

Applying cross-validation...

Out[6]:

	train_r2	test_r2	train_rmse	test_rmse	time (s)
0	0.796038	0.542475	-1.453147	-2.195779	0.640582
1	0.794954	0.540335	-1.457709	-2.196391	0.644585
2	0.790722	0.505952	-1.492522	-2.153392	0.672611
3	0.785317	0.580703	-1.474827	-2.189902	0.684622
4	0.795872	0.547938	-1.461929	-2.135022	0.741673
mean	0.792581	0.543480	-1.468027	-2.174097	0.676815
std	0.004114	0.023769	0.014222	0.025189	0.036433

Analyze the results¶

In [7]:

Copied!

# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]

Out[7]:

	r2_ht	r2_train	r2_test	rmse_ht	rmse_train	rmse_test
lSVM	0.492530	0.4583	0.4552	-2.262754	-2.3815	-2.3439
hGBM	0.546368	0.7959	0.5479	-2.119672	-1.4619	-2.1350

In [8]:

Copied!





# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")

In [9]:

Copied!

atom.plot_results(metric="r2")
atom.plot_results(metric="r2")