Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Memory: 509.72 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 187 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835
In [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_mean_squared_error Running BO for Linear-SVM... | call | loss | C | dual | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squared_epsilon_insen.. | 46.0031 | True | 0.4704 | 0.4704 | -6.505 | -6.505 | 0.119s | 0.127s | | Initial point 2 | squared_epsilon_insen.. | 0.0152 | True | 0.3273 | 0.4704 | -6.4239 | -6.4239 | 0.047s | 0.395s | | Initial point 3 | epsilon_insensitive | 2.2322 | True | 0.4221 | 0.4704 | -7.2202 | -6.4239 | 0.064s | 0.535s | | Initial point 4 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -5.1322 | -5.1322 | 0.038s | 0.650s | | Iteration 5 | epsilon_insensitive | 0.0414 | True | 0.3922 | 0.5469 | -5.9024 | -5.1322 | 0.035s | 0.917s | | Iteration 6 | squared_epsilon_insen.. | 0.0036 | False | 0.3266 | 0.5469 | -7.2191 | -5.1322 | 0.038s | 1.230s | | Iteration 7 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -5.1322 | -5.1322 | 0.000s | 1.459s | | Iteration 8 | squared_epsilon_insen.. | 0.0408 | False | 0.4781 | 0.5469 | -5.0326 | -5.0326 | 0.036s | 1.809s | | Iteration 9 | squared_epsilon_insen.. | 0.033 | False | 0.4635 | 0.5469 | -5.6456 | -5.0326 | 0.037s | 2.173s | | Iteration 10 | squared_epsilon_insen.. | 0.0369 | True | 0.4594 | 0.5469 | -5.6458 | -5.0326 | 0.042s | 2.623s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False} Best evaluation --> r2: 0.5469 neg_mean_squared_error: -5.1322 Time elapsed: 2.924s Fit --------------------------------------------- Train evaluation --> r2: 0.4648 neg_mean_squared_error: -5.6414 Test evaluation --> r2: 0.4328 neg_mean_squared_error: -5.5575 Time elapsed: 0.018s Bootstrap --------------------------------------- Evaluation --> r2: 0.4318 ± 0.0046 neg_mean_squared_error: -5.5675 ± 0.0452 Time elapsed: 0.090s ------------------------------------------------- Total time: 3.032s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | poisson | 0.733 | 73 | 50 | 2 | 18 | 0.4 | 0.5773 | 0.5773 | -5.1926 | -5.1926 | 0.089s | 0.102s | | Initial point 2 | absolute_.. | 0.7432 | 425 | 23 | 5 | 19 | 0.2 | 0.4857 | 0.5773 | -4.9114 | -4.9114 | 0.920s | 1.129s | | Initial point 3 | absolute_.. | 0.6729 | 234 | 27 | 9 | 26 | 0.7 | 0.4833 | 0.5773 | -6.4547 | -4.9114 | 1.007s | 2.240s | | Initial point 4 | poisson | 0.0153 | 264 | 45 | 8 | 27 | 0.3 | 0.6013 | 0.6013 | -4.5159 | -4.5159 | 1.594s | 3.934s | | Iteration 5 | squared_e.. | 0.0719 | 410 | 34 | 6 | 29 | 0.1 | 0.4733 | 0.6013 | -5.1143 | -4.5159 | 0.948s | 5.175s | | Iteration 6 | poisson | 0.01 | 500 | 10 | None | 10 | 1.0 | 0.5466 | 0.6013 | -4.8608 | -4.5159 | 1.012s | 6.688s | | Iteration 7 | poisson | 0.01 | 201 | 50 | 8 | 24 | 1.0 | 0.5438 | 0.6013 | -5.1586 | -4.5159 | 1.639s | 8.761s | | Iteration 8 | poisson | 0.01 | 500 | 10 | None | 10 | 0.0 | 0.5378 | 0.6013 | -4.4574 | -4.4574 | 0.969s | 10.346s | | Iteration 9 | poisson | 0.0106 | 324 | 42 | 9 | 30 | 0.0 | 0.5672 | 0.6013 | -4.5537 | -4.4574 | 2.739s | 13.464s | | Iteration 10 | poisson | 1.0 | 500 | 50 | None | 10 | 0.0 | 0.1422 | 0.6013 | -8.9582 | -4.4574 | 3.951s | 17.772s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 8, 'min_samples_leaf': 27, 'l2_regularization': 0.3} Best evaluation --> r2: 0.6013 neg_mean_squared_error: -4.5159 Time elapsed: 18.180s Fit --------------------------------------------- Train evaluation --> r2: 0.7132 neg_mean_squared_error: -3.023 Test evaluation --> r2: 0.5224 neg_mean_squared_error: -4.6794 Time elapsed: 1.915s Bootstrap --------------------------------------- Evaluation --> r2: 0.509 ± 0.014 neg_mean_squared_error: -4.8115 ± 0.1372 Time elapsed: 9.698s ------------------------------------------------- Total time: 29.795s Final results ==================== >> Duration: 32.827s ------------------------------------- Linear-SVM --> r2: 0.4318 ± 0.0046 neg_mean_squared_error: -5.5675 ± 0.0452 HistGBM --> r2: 0.509 ± 0.014 neg_mean_squared_error: -4.8115 ± 0.1372 ~ !
Analyze the results¶
In [6]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[6]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.5468896602938744, -5.13219509627928] | [0.4648239922283617, -5.6414015824978] | [0.43283219743814105, -5.5574871296346195] |
hGBM | [0.6012979195596735, -4.515935044517652] | [0.7132250948243765, -3.0229539074717753] | [0.522445526933692, -4.679396160673255] |
In [7]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
In [8]:
Copied!
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")