Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Memory: 509.72 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 187 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835
In [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_root_mean_squared_error Running BO for Linear SVM... | call | loss | C | dual | r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error | time | total_time | | ---------------- | ----------------------- | ------- | ------- | ------- | ------- | --------------------------- | -------------------------------- | ------- | ---------- | | Initial point 1 | squared_epsilon_insen.. | 46.0031 | True | 0.4704 | 0.4704 | -2.5505 | -2.5505 | 0.117s | 0.132s | | Initial point 2 | squared_epsilon_insen.. | 0.0152 | True | 0.3273 | 0.4704 | -2.5345 | -2.5345 | 0.031s | 0.163s | | Initial point 3 | epsilon_insensitive | 2.2322 | True | 0.4221 | 0.4704 | -2.687 | -2.5345 | 0.078s | 0.242s | | Initial point 4 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -2.2654 | -2.2654 | 0.031s | 0.273s | | Iteration 5 | epsilon_insensitive | 0.0414 | True | 0.3922 | 0.5469 | -2.4295 | -2.2654 | 0.031s | 0.476s | | Iteration 6 | squared_epsilon_insen.. | 0.0036 | False | 0.3266 | 0.5469 | -2.6868 | -2.2654 | 0.047s | 0.726s | | Iteration 7 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -2.2654 | -2.2654 | 0.000s | 0.898s | | Iteration 8 | squared_epsilon_insen.. | 0.0408 | False | 0.4781 | 0.5469 | -2.2434 | -2.2434 | 0.047s | 1.210s | | Iteration 9 | squared_epsilon_insen.. | 0.033 | False | 0.4635 | 0.5469 | -2.376 | -2.2434 | 0.047s | 1.585s | | Iteration 10 | squared_epsilon_insen.. | 0.0369 | True | 0.4594 | 0.5469 | -2.3761 | -2.2434 | 0.047s | 1.960s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False} Best evaluation --> r2: 0.5469 neg_root_mean_squared_error: -2.2654 Time elapsed: 2.179s Fit --------------------------------------------- Train evaluation --> r2: 0.4648 neg_root_mean_squared_error: -2.3752 Test evaluation --> r2: 0.4328 neg_root_mean_squared_error: -2.3574 Time elapsed: 0.016s Bootstrap --------------------------------------- Evaluation --> r2: 0.4318 ± 0.0046 neg_root_mean_squared_error: -2.3595 ± 0.0096 Time elapsed: 0.094s ------------------------------------------------- Total time: 2.289s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error | time | total_time | | ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | --------------------------- | -------------------------------- | ------- | ---------- | | Initial point 1 | poisson | 0.733 | 73 | 50 | 4 | 18 | 0.4 | 0.5137 | 0.5137 | -2.4441 | -2.4441 | 0.141s | 0.156s | | Initial point 2 | absolute_.. | 0.7432 | 425 | 23 | 8 | 19 | 0.2 | 0.4734 | 0.5137 | -2.2425 | -2.2425 | 1.656s | 1.828s | | Initial point 3 | absolute_.. | 0.6729 | 234 | 27 | 15 | 26 | 0.7 | 0.4905 | 0.5137 | -2.523 | -2.2425 | 1.281s | 3.110s | | Initial point 4 | poisson | 0.0153 | 264 | 45 | 13 | 27 | 0.3 | 0.6013 | 0.6013 | -2.1251 | -2.1251 | 1.781s | 4.891s | | Iteration 5 | poisson | 0.01 | 28 | 50 | 12 | 15 | 0.6 | 0.2368 | 0.6013 | -2.7223 | -2.1251 | 0.250s | 5.407s | | Iteration 6 | poisson | 0.0274 | 210 | 45 | 9 | 28 | 0.0 | 0.5329 | 0.6013 | -2.2378 | -2.1251 | 1.063s | 6.782s | | Iteration 7 | squared_e.. | 0.01 | 453 | 17 | 11 | 25 | 0.4 | 0.5496 | 0.6013 | -2.2567 | -2.1251 | 1.156s | 8.235s | | Iteration 8 | absolute_.. | 0.0102 | 333 | 34 | 14 | 30 | 0.7 | 0.5373 | 0.6013 | -2.1123 | -2.1123 | 2.078s | 10.626s | | Iteration 9 | poisson | 0.0138 | 147 | 50 | 16 | 27 | 0.2 | 0.5534 | 0.6013 | -2.1679 | -2.1123 | 1.000s | 11.938s | | Iteration 10 | poisson | 1.0 | 36 | 26 | 6 | 10 | 0.9 | 0.3135 | 0.6013 | -2.6775 | -2.1123 | 0.141s | 12.298s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 13, 'min_samples_leaf': 27, 'l2_regularization': 0.3} Best evaluation --> r2: 0.6013 neg_root_mean_squared_error: -2.1251 Time elapsed: 12.720s Fit --------------------------------------------- Train evaluation --> r2: 0.7301 neg_root_mean_squared_error: -1.6867 Test evaluation --> r2: 0.5224 neg_root_mean_squared_error: -2.1632 Time elapsed: 1.609s Bootstrap --------------------------------------- Evaluation --> r2: 0.5064 ± 0.0163 neg_root_mean_squared_error: -2.1989 ± 0.0364 Time elapsed: 10.424s ------------------------------------------------- Total time: 24.753s Final results ==================== >> Duration: 27.042s ------------------------------------- Linear SVM --> r2: 0.4318 ± 0.0046 neg_root_mean_squared_error: -2.3595 ± 0.0096 HistGBM --> r2: 0.5064 ± 0.0163 neg_root_mean_squared_error: -2.1989 ± 0.0364 ~ !
In [6]:
Copied!
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
Applying cross-validation...
Out[6]:
train_r2 | test_r2 | train_neg_root_mean_squared_error | test_neg_root_mean_squared_error | time (s) | |
---|---|---|---|---|---|
0 | 0.727249 | 0.545513 | -1.675553 | -2.214354 | 1.739589 |
1 | 0.719495 | 0.569580 | -1.695233 | -2.173604 | 1.797012 |
2 | 0.732069 | 0.514090 | -1.663940 | -2.272483 | 1.640750 |
3 | 0.719680 | 0.581855 | -1.719713 | -2.020461 | 1.616995 |
4 | 0.730103 | 0.522438 | -1.686724 | -2.163208 | 1.656376 |
mean | 0.725719 | 0.546695 | -1.688233 | -2.168822 | 1.690144 |
std | 0.005236 | 0.026126 | 0.018938 | 0.083527 | 0.067522 |
Analyze the results¶
In [7]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[7]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.5468896602938744, -2.265434858096626] | [0.4648239922283617, -2.37516348542533] | [0.43283219743814105, -2.3574323170845477] |
hGBM | [0.6012801528863336, -2.125120297852416] | [0.7301031935292892, -1.686724150354464] | [0.5224380449509453, -2.1632081439643134] |
In [8]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="rmse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="rmse", title="BO performance for Mean Squared Error")
In [9]:
Copied!
atom.plot_results(metric="rmse")
atom.plot_results(metric="rmse")