Example: Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
InĀ [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
InĀ [2]:
Copied!
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
InĀ [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Regression. Dataset stats ==================== >> Shape: (4177, 9) Train set size: 3342 Test set size: 835 ------------------------------------- Memory: 300.88 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 189 (0.6%)
InĀ [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
InĀ [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, rmse Running hyperparameter tuning for LinearSVM... | trial | loss | C | dual | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | squared_epsilon_insen.. | 0.001 | True | 0.2887 | 0.2887 | -2.6528 | -2.6528 | 0.081s | 0.081s | COMPLETE | | 1 | squared_epsilon_insen.. | 0.0534 | False | 0.3862 | 0.3862 | -2.5926 | -2.5926 | 0.078s | 0.159s | COMPLETE | | 2 | squared_epsilon_insen.. | 0.0105 | True | 0.433 | 0.433 | -2.4084 | -2.4084 | 0.070s | 0.229s | COMPLETE | | 3 | epsilon_insensitive | 0.6215 | True | 0.4022 | 0.433 | -2.5251 | -2.4084 | 0.064s | 0.293s | COMPLETE | | 4 | squared_epsilon_insen.. | 0.0369 | False | 0.4057 | 0.433 | -2.5477 | -2.4084 | 0.058s | 0.351s | COMPLETE | | 5 | epsilon_insensitive | 0.0016 | True | -1.5344 | 0.433 | -5.0102 | -2.4084 | 0.052s | 0.403s | COMPLETE | | 6 | squared_epsilon_insen.. | 61.5811 | False | 0.4354 | 0.4354 | -2.3845 | -2.3845 | 0.047s | 0.450s | COMPLETE | | 7 | squared_epsilon_insen.. | 14.898 | False | 0.4925 | 0.4925 | -2.2628 | -2.2628 | 0.050s | 0.500s | COMPLETE | | 8 | epsilon_insensitive | 0.0252 | True | 0.3695 | 0.4925 | -2.6178 | -2.2628 | 0.048s | 0.548s | COMPLETE | | 9 | squared_epsilon_insen.. | 0.0294 | True | 0.4767 | 0.4925 | -2.3896 | -2.2628 | 0.055s | 0.604s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 7 Best parameters: --> loss: squared_epsilon_insensitive --> C: 14.898 --> dual: False Best evaluation --> r2: 0.4925 rmse: -2.2628 Time elapsed: 0.604s Fit --------------------------------------------- Train evaluation --> r2: 0.4592 rmse: -2.3795 Test evaluation --> r2: 0.4584 rmse: -2.3369 Time elapsed: 0.112s Bootstrap --------------------------------------- Evaluation --> r2: 0.4577 ± 0.002 rmse: -2.3384 ± 0.0043 Time elapsed: 0.131s ------------------------------------------------- Time: 0.847s Running hyperparameter tuning for HistGradientBoosting... | trial | loss | quantile | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | --------- | -------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | absolut.. | 0.1 | 0.0236 | 180 | 26 | 12 | 11 | 0.0 | 0.5373 | 0.5373 | -2.1398 | -2.1398 | 0.933s | 0.933s | COMPLETE | | 1 | gamma | 0.5 | 0.242 | 160 | 38 | 3 | 20 | 0.0 | 0.574 | 0.574 | -2.1598 | -2.1398 | 0.166s | 1.099s | COMPLETE | | 2 | quantile | 0.4 | 0.2448 | 210 | 12 | 3 | 25 | 0.3 | 0.4714 | 0.574 | -2.3253 | -2.1398 | 0.474s | 1.573s | COMPLETE | | 3 | quantile | 0.6 | 0.017 | 480 | 28 | 16 | 13 | 0.1 | 0.5712 | 0.574 | -2.1385 | -2.1385 | 3.667s | 5.241s | COMPLETE | | 4 | squared.. | 1.0 | 0.2649 | 70 | 10 | 10 | 28 | 0.8 | 0.5561 | 0.574 | -2.2019 | -2.1385 | 0.167s | 5.408s | COMPLETE | | 5 | squared.. | 0.1 | 0.0283 | 360 | 32 | 9 | 11 | 0.5 | 0.5464 | 0.574 | -2.1197 | -2.1197 | 1.312s | 6.720s | COMPLETE | | 6 | quantile | 0.4 | 0.1264 | 380 | 37 | 12 | 29 | 1.0 | 0.4417 | 0.574 | -2.3712 | -2.1197 | 3.378s | 10.098s | COMPLETE | | 7 | gamma | 0.6 | 0.678 | 330 | 25 | 6 | 12 | 0.8 | 0.4299 | 0.574 | -2.3984 | -2.1197 | 0.807s | 10.904s | COMPLETE | | 8 | absolut.. | 0.9 | 0.0831 | 280 | 42 | 16 | 10 | 1.0 | 0.5242 | 0.574 | -2.2742 | -2.1197 | 2.341s | 13.246s | COMPLETE | | 9 | absolut.. | 0.6 | 0.0373 | 300 | 40 | 13 | 17 | 0.8 | 0.5685 | 0.574 | -2.17 | -2.1197 | 2.032s | 15.277s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 5 Best parameters: --> loss: squared_error --> quantile: 0.1 --> learning_rate: 0.0283 --> max_iter: 360 --> max_leaf_nodes: 32 --> max_depth: 9 --> min_samples_leaf: 11 --> l2_regularization: 0.5 Best evaluation --> r2: 0.5464 rmse: -2.1197 Time elapsed: 15.277s Fit --------------------------------------------- Train evaluation --> r2: 0.7959 rmse: -1.4619 Test evaluation --> r2: 0.5479 rmse: -2.135 Time elapsed: 1.311s Bootstrap --------------------------------------- Evaluation --> r2: 0.5259 ± 0.0154 rmse: -2.1862 ± 0.0353 Time elapsed: 4.073s ------------------------------------------------- Time: 20.661s Final results ==================== >> Total time: 22.907s ------------------------------------- LinearSVM --> r2: 0.4577 ± 0.002 rmse: -2.3384 ± 0.0043 HistGradientBoosting --> r2: 0.5259 ± 0.0154 rmse: -2.1862 ± 0.0353 ~ !
InĀ [6]:
Copied!
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
Applying cross-validation...
Out[6]:
train_r2 | test_r2 | train_rmse | test_rmse | time (s) | |
---|---|---|---|---|---|
0 | 0.796038 | 0.542475 | -1.453147 | -2.195779 | 0.640582 |
1 | 0.794954 | 0.540335 | -1.457709 | -2.196391 | 0.644585 |
2 | 0.790722 | 0.505952 | -1.492522 | -2.153392 | 0.672611 |
3 | 0.785317 | 0.580703 | -1.474827 | -2.189902 | 0.684622 |
4 | 0.795872 | 0.547938 | -1.461929 | -2.135022 | 0.741673 |
mean | 0.792581 | 0.543480 | -1.468027 | -2.174097 | 0.676815 |
std | 0.004114 | 0.023769 | 0.014222 | 0.025189 | 0.036433 |
Analyze the results¶
InĀ [7]:
Copied!
# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
Out[7]:
r2_ht | r2_train | r2_test | rmse_ht | rmse_train | rmse_test | |
---|---|---|---|---|---|---|
lSVM | 0.492530 | 0.4583 | 0.4552 | -2.262754 | -2.3815 | -2.3439 |
hGBM | 0.546368 | 0.7959 | 0.5479 | -2.119672 | -1.4619 | -2.1350 |
InĀ [8]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
InĀ [9]:
Copied!
atom.plot_results(metric="r2")
atom.plot_results(metric="r2")