Example: Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

Copied!

# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

In [2]:

Copied!

# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

Copied!

atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Train set size: 3342
Test set size: 835
-------------------------------------
Memory: 300.88 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 189 (0.6%)

In [4]:

Copied!

atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

Copied!





# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, rmse


Running hyperparameter tuning for LinearSVM...
| trial |                    loss |       C |    dual |      r2 | best_r2 |    rmse | best_rmse | time_trial | time_ht |    state |
| ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     | squared_epsilon_insen.. |   0.001 |    True |  0.2887 |  0.2887 | -2.6528 |   -2.6528 |     0.054s |  0.054s | COMPLETE |
| 1     | squared_epsilon_insen.. |  0.0534 |   False |  0.4507 |  0.4507 | -2.3314 |   -2.3314 |     0.047s |  0.101s | COMPLETE |
| 2     | squared_epsilon_insen.. |  0.0105 |    True |   0.451 |   0.451 | -2.3307 |   -2.3307 |     0.056s |  0.157s | COMPLETE |
| 3     |     epsilon_insensitive |  0.6215 |    True |  0.4266 |   0.451 | -2.3818 |   -2.3307 |     0.059s |  0.216s | COMPLETE |
| 4     | squared_epsilon_insen.. |  0.0369 |   False |  0.4509 |   0.451 | -2.3308 |   -2.3307 |     0.049s |  0.265s | COMPLETE |
| 5     |     epsilon_insensitive |  0.0016 |    True | -1.5995 |   0.451 | -5.0716 |   -2.3307 |     0.049s |  0.314s | COMPLETE |
| 6     | squared_epsilon_insen.. | 61.5811 |   False |  0.4499 |   0.451 |  -2.333 |   -2.3307 |     0.054s |  0.369s | COMPLETE |
| 7     | squared_epsilon_insen.. |  14.898 |   False |  0.4499 |   0.451 |  -2.333 |   -2.3307 |     0.049s |  0.418s | COMPLETE |
| 8     |     epsilon_insensitive |  0.0252 |    True |  0.3798 |   0.451 | -2.4772 |   -2.3307 |     0.051s |  0.469s | COMPLETE |
| 9     | squared_epsilon_insen.. |  0.0294 |    True |  0.4512 |  0.4512 | -2.3302 |   -2.3302 |     0.051s |  0.520s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 9
Best parameters:
 --> loss: squared_epsilon_insensitive
 --> C: 0.0294
 --> dual: True
Best evaluation --> r2: 0.4512   rmse: -2.3302
Time elapsed: 0.520s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4587   rmse: -2.3806
Test evaluation --> r2: 0.4586   rmse: -2.3365
Time elapsed: 0.120s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.458 ± 0.0014   rmse: -2.3377 ± 0.0031
Time elapsed: 0.199s
-------------------------------------------------
Time: 0.838s


Running hyperparameter tuning for HistGradientBoosting...
| trial |      loss | quantile | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 |    rmse | best_rmse | time_trial | time_ht |    state |
| ----- | --------- | -------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     | absolut.. |      0.1 |        0.0236 |      180 |             26 |        12 |               11 |               0.0 |  0.5373 |  0.5373 | -2.1398 |   -2.1398 |     0.928s |  0.928s | COMPLETE |
| 1     |     gamma |      0.5 |         0.242 |      160 |             38 |         3 |               20 |               0.0 |   0.556 |   0.556 | -2.0959 |   -2.0959 |     0.178s |  1.106s | COMPLETE |
| 2     |  quantile |      0.4 |        0.2448 |      210 |             12 |         3 |               25 |               0.3 |  0.4906 |   0.556 |  -2.245 |   -2.0959 |     0.391s |  1.497s | COMPLETE |
| 3     |  quantile |      0.6 |         0.017 |      480 |             28 |        16 |               13 |               0.1 |  0.5535 |   0.556 | -2.1018 |   -2.0959 |     2.741s |  4.239s | COMPLETE |
| 4     | squared.. |      1.0 |        0.2649 |       70 |             10 |        10 |               28 |               0.8 |  0.5403 |   0.556 | -2.1327 |   -2.0959 |     0.117s |  4.356s | COMPLETE |
| 5     | squared.. |      0.1 |        0.0283 |      360 |             32 |         9 |               11 |               0.5 |  0.5466 |   0.556 |  -2.118 |   -2.0959 |     0.863s |  5.219s | COMPLETE |
| 6     |  quantile |      0.4 |        0.1264 |      380 |             37 |        12 |               29 |               1.0 |  0.4977 |   0.556 | -2.2292 |   -2.0959 |     2.699s |  7.918s | COMPLETE |
| 7     |     gamma |      0.6 |         0.678 |      330 |             25 |         6 |               12 |               0.8 |  0.3783 |   0.556 | -2.4802 |   -2.0959 |     0.593s |  8.511s | COMPLETE |
| 8     | absolut.. |      0.9 |        0.0831 |      280 |             42 |        16 |               10 |               1.0 |  0.5285 |   0.556 |   -2.16 |   -2.0959 |     1.647s | 10.158s | COMPLETE |
| 9     | absolut.. |      0.6 |        0.0373 |      300 |             40 |        13 |               17 |               0.8 |  0.5381 |   0.556 | -2.1378 |   -2.0959 |     1.939s | 12.098s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 1
Best parameters:
 --> loss: gamma
 --> quantile: 0.5
 --> learning_rate: 0.242
 --> max_iter: 160
 --> max_leaf_nodes: 38
 --> max_depth: 3
 --> min_samples_leaf: 20
 --> l2_regularization: 0.0
Best evaluation --> r2: 0.556   rmse: -2.0959
Time elapsed: 12.098s
Fit ---------------------------------------------
Train evaluation --> r2: 0.7182   rmse: -1.7178
Test evaluation --> r2: 0.5427   rmse: -2.1473
Time elapsed: 0.232s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5241 ± 0.022   rmse: -2.19 ± 0.0504
Time elapsed: 0.782s
-------------------------------------------------
Time: 13.111s


Final results ==================== >>
Total time: 15.597s
-------------------------------------
LinearSVM            --> r2: 0.458 ± 0.0014   rmse: -2.3377 ± 0.0031
HistGradientBoosting --> r2: 0.5241 ± 0.022   rmse: -2.19 ± 0.0504 ~ !

In [6]:

Copied!

# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()

Applying cross-validation...

Out[6]:

	train_r2	test_r2	train_rmse	test_rmse	time
0	0.719472	0.535855	-1.704209	-2.211607	0.135122
1	0.726760	0.539374	-1.682744	-2.198686	0.135123
2	0.723168	0.515653	-1.716590	-2.132145	0.143130
3	0.701827	0.596559	-1.738110	-2.148098	0.116105
4	0.718167	0.542704	-1.717794	-2.147347	0.148133
mean	0.717879	0.546029	-1.711889	-2.167577	0.135523

Analyze the results¶

In [7]:

Copied!

# The columns in the results dataframe contain one for each metric
atom.results.data[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
# The columns in the results dataframe contain one for each metric
atom.results.data[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]

Out[7]:

	r2_ht	r2_train	r2_test	rmse_ht	rmse_train	rmse_test
lSVM	0.451203	0.4580	0.4565	-2.330239	-2.3823	-2.3411
hGBM	0.556021	0.7182	0.5427	-2.095926	-1.7178	-2.1473

In [8]:

Copied!





# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")

In [9]:

Copied!

atom.plot_results(metric="r2")
atom.plot_results(metric="r2")