Example: Multi-metric runsĀ¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the dataĀ¶
InĀ [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
InĀ [2]:
Copied!
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipelineĀ¶
InĀ [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Regression. Dataset stats ==================== >> Shape: (4177, 9) Train set size: 3342 Test set size: 835 ------------------------------------- Memory: 300.88 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 189 (0.6%)
InĀ [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
InĀ [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, rmse Running hyperparameter tuning for LinearSVM... | trial | loss | C | dual | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | squared_epsilon_insen.. | 0.001 | True | 0.2887 | 0.2887 | -2.6528 | -2.6528 | 0.054s | 0.054s | COMPLETE | | 1 | squared_epsilon_insen.. | 0.0534 | False | 0.4507 | 0.4507 | -2.3314 | -2.3314 | 0.047s | 0.101s | COMPLETE | | 2 | squared_epsilon_insen.. | 0.0105 | True | 0.451 | 0.451 | -2.3307 | -2.3307 | 0.056s | 0.157s | COMPLETE | | 3 | epsilon_insensitive | 0.6215 | True | 0.4266 | 0.451 | -2.3818 | -2.3307 | 0.059s | 0.216s | COMPLETE | | 4 | squared_epsilon_insen.. | 0.0369 | False | 0.4509 | 0.451 | -2.3308 | -2.3307 | 0.049s | 0.265s | COMPLETE | | 5 | epsilon_insensitive | 0.0016 | True | -1.5995 | 0.451 | -5.0716 | -2.3307 | 0.049s | 0.314s | COMPLETE | | 6 | squared_epsilon_insen.. | 61.5811 | False | 0.4499 | 0.451 | -2.333 | -2.3307 | 0.054s | 0.369s | COMPLETE | | 7 | squared_epsilon_insen.. | 14.898 | False | 0.4499 | 0.451 | -2.333 | -2.3307 | 0.049s | 0.418s | COMPLETE | | 8 | epsilon_insensitive | 0.0252 | True | 0.3798 | 0.451 | -2.4772 | -2.3307 | 0.051s | 0.469s | COMPLETE | | 9 | squared_epsilon_insen.. | 0.0294 | True | 0.4512 | 0.4512 | -2.3302 | -2.3302 | 0.051s | 0.520s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 9 Best parameters: --> loss: squared_epsilon_insensitive --> C: 0.0294 --> dual: True Best evaluation --> r2: 0.4512 rmse: -2.3302 Time elapsed: 0.520s Fit --------------------------------------------- Train evaluation --> r2: 0.4587 rmse: -2.3806 Test evaluation --> r2: 0.4586 rmse: -2.3365 Time elapsed: 0.120s Bootstrap --------------------------------------- Evaluation --> r2: 0.458 Ā± 0.0014 rmse: -2.3377 Ā± 0.0031 Time elapsed: 0.199s ------------------------------------------------- Time: 0.838s Running hyperparameter tuning for HistGradientBoosting... | trial | loss | quantile | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | --------- | -------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | absolut.. | 0.1 | 0.0236 | 180 | 26 | 12 | 11 | 0.0 | 0.5373 | 0.5373 | -2.1398 | -2.1398 | 0.928s | 0.928s | COMPLETE | | 1 | gamma | 0.5 | 0.242 | 160 | 38 | 3 | 20 | 0.0 | 0.556 | 0.556 | -2.0959 | -2.0959 | 0.178s | 1.106s | COMPLETE | | 2 | quantile | 0.4 | 0.2448 | 210 | 12 | 3 | 25 | 0.3 | 0.4906 | 0.556 | -2.245 | -2.0959 | 0.391s | 1.497s | COMPLETE | | 3 | quantile | 0.6 | 0.017 | 480 | 28 | 16 | 13 | 0.1 | 0.5535 | 0.556 | -2.1018 | -2.0959 | 2.741s | 4.239s | COMPLETE | | 4 | squared.. | 1.0 | 0.2649 | 70 | 10 | 10 | 28 | 0.8 | 0.5403 | 0.556 | -2.1327 | -2.0959 | 0.117s | 4.356s | COMPLETE | | 5 | squared.. | 0.1 | 0.0283 | 360 | 32 | 9 | 11 | 0.5 | 0.5466 | 0.556 | -2.118 | -2.0959 | 0.863s | 5.219s | COMPLETE | | 6 | quantile | 0.4 | 0.1264 | 380 | 37 | 12 | 29 | 1.0 | 0.4977 | 0.556 | -2.2292 | -2.0959 | 2.699s | 7.918s | COMPLETE | | 7 | gamma | 0.6 | 0.678 | 330 | 25 | 6 | 12 | 0.8 | 0.3783 | 0.556 | -2.4802 | -2.0959 | 0.593s | 8.511s | COMPLETE | | 8 | absolut.. | 0.9 | 0.0831 | 280 | 42 | 16 | 10 | 1.0 | 0.5285 | 0.556 | -2.16 | -2.0959 | 1.647s | 10.158s | COMPLETE | | 9 | absolut.. | 0.6 | 0.0373 | 300 | 40 | 13 | 17 | 0.8 | 0.5381 | 0.556 | -2.1378 | -2.0959 | 1.939s | 12.098s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 1 Best parameters: --> loss: gamma --> quantile: 0.5 --> learning_rate: 0.242 --> max_iter: 160 --> max_leaf_nodes: 38 --> max_depth: 3 --> min_samples_leaf: 20 --> l2_regularization: 0.0 Best evaluation --> r2: 0.556 rmse: -2.0959 Time elapsed: 12.098s Fit --------------------------------------------- Train evaluation --> r2: 0.7182 rmse: -1.7178 Test evaluation --> r2: 0.5427 rmse: -2.1473 Time elapsed: 0.232s Bootstrap --------------------------------------- Evaluation --> r2: 0.5241 Ā± 0.022 rmse: -2.19 Ā± 0.0504 Time elapsed: 0.782s ------------------------------------------------- Time: 13.111s Final results ==================== >> Total time: 15.597s ------------------------------------- LinearSVM --> r2: 0.458 Ā± 0.0014 rmse: -2.3377 Ā± 0.0031 HistGradientBoosting --> r2: 0.5241 Ā± 0.022 rmse: -2.19 Ā± 0.0504 ~ !
InĀ [6]:
Copied!
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
Applying cross-validation...
Out[6]:
Ā | train_r2 | test_r2 | train_rmse | test_rmse | time |
---|---|---|---|---|---|
0 | 0.719472 | 0.535855 | -1.704209 | -2.211607 | 0.135122 |
1 | 0.726760 | 0.539374 | -1.682744 | -2.198686 | 0.135123 |
2 | 0.723168 | 0.515653 | -1.716590 | -2.132145 | 0.143130 |
3 | 0.701827 | 0.596559 | -1.738110 | -2.148098 | 0.116105 |
4 | 0.718167 | 0.542704 | -1.717794 | -2.147347 | 0.148133 |
mean | 0.717879 | 0.546029 | -1.711889 | -2.167577 | 0.135523 |
Analyze the resultsĀ¶
InĀ [7]:
Copied!
# The columns in the results dataframe contain one for each metric
atom.results.data[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
# The columns in the results dataframe contain one for each metric
atom.results.data[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
Out[7]:
r2_ht | r2_train | r2_test | rmse_ht | rmse_train | rmse_test | |
---|---|---|---|---|---|---|
lSVM | 0.451203 | 0.4580 | 0.4565 | -2.330239 | -2.3823 | -2.3411 |
hGBM | 0.556021 | 0.7182 | 0.5427 | -2.095926 | -1.7178 | -2.1473 |
InĀ [8]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
InĀ [9]:
Copied!
atom.plot_results(metric="r2")
atom.plot_results(metric="r2")