Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Memory: 509.72 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 187 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835
In [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_mean_squared_error Running BO for Linear SVM... | call | loss | C | dual | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squared_epsilon_insen.. | 46.0031 | True | 0.4704 | 0.4704 | -6.505 | -6.505 | 0.118s | 0.126s | | Initial point 2 | squared_epsilon_insen.. | 0.0152 | True | 0.3273 | 0.4704 | -6.4239 | -6.4239 | 0.038s | 0.386s | | Initial point 3 | epsilon_insensitive | 2.2322 | True | 0.4221 | 0.4704 | -7.2202 | -6.4239 | 0.064s | 0.528s | | Initial point 4 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -5.1322 | -5.1322 | 0.039s | 0.643s | | Iteration 5 | epsilon_insensitive | 0.0414 | True | 0.3922 | 0.5469 | -5.9024 | -5.1322 | 0.035s | 0.923s | | Iteration 6 | squared_epsilon_insen.. | 0.0036 | False | 0.3266 | 0.5469 | -7.2191 | -5.1322 | 0.037s | 1.235s | | Iteration 7 | squared_epsilon_insen.. | 0.0368 | False | 0.5469 | 0.5469 | -5.1322 | -5.1322 | 0.001s | 1.475s | | Iteration 8 | squared_epsilon_insen.. | 0.0408 | False | 0.4781 | 0.5469 | -5.0326 | -5.0326 | 0.037s | 1.831s | | Iteration 9 | squared_epsilon_insen.. | 0.033 | False | 0.4635 | 0.5469 | -5.6456 | -5.0326 | 0.042s | 2.187s | | Iteration 10 | squared_epsilon_insen.. | 0.0369 | True | 0.4594 | 0.5469 | -5.6458 | -5.0326 | 0.040s | 2.719s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0368, 'dual': False} Best evaluation --> r2: 0.5469 neg_mean_squared_error: -5.1322 Time elapsed: 3.015s Fit --------------------------------------------- Train evaluation --> r2: 0.4648 neg_mean_squared_error: -5.6414 Test evaluation --> r2: 0.4328 neg_mean_squared_error: -5.5575 Time elapsed: 0.018s Bootstrap --------------------------------------- Evaluation --> r2: 0.4318 ± 0.0046 neg_mean_squared_error: -5.5675 ± 0.0452 Time elapsed: 0.094s ------------------------------------------------- Total time: 3.127s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | poisson | 0.733 | 73 | 50 | 4 | 18 | 0.4 | 0.5137 | 0.5137 | -5.9736 | -5.9736 | 0.155s | 0.169s | | Initial point 2 | absolute_.. | 0.7432 | 425 | 23 | 8 | 19 | 0.2 | 0.4734 | 0.5137 | -5.0286 | -5.0286 | 1.677s | 1.945s | | Initial point 3 | absolute_.. | 0.6729 | 234 | 27 | 15 | 26 | 0.7 | 0.4905 | 0.5137 | -6.3654 | -5.0286 | 1.645s | 3.692s | | Initial point 4 | poisson | 0.0153 | 264 | 45 | 13 | 27 | 0.3 | 0.6013 | 0.6013 | -4.5161 | -4.5161 | 1.748s | 5.547s | | Iteration 5 | poisson | 0.01 | 28 | 50 | 12 | 15 | 0.6 | 0.2368 | 0.6013 | -7.4112 | -4.5161 | 0.248s | 6.132s | | Iteration 6 | poisson | 0.0274 | 210 | 45 | 9 | 28 | 0.0 | 0.5329 | 0.6013 | -5.0078 | -4.5161 | 1.057s | 7.565s | | Iteration 7 | squared_e.. | 0.01 | 453 | 17 | 11 | 25 | 0.4 | 0.5496 | 0.6013 | -5.0927 | -4.5161 | 2.402s | 10.570s | | Iteration 8 | absolute_.. | 0.0102 | 333 | 34 | 14 | 30 | 0.7 | 0.5373 | 0.6013 | -4.4619 | -4.4619 | 2.312s | 13.269s | | Iteration 9 | poisson | 0.0138 | 147 | 50 | 16 | 27 | 0.2 | 0.5534 | 0.6013 | -4.6996 | -4.4619 | 1.132s | 14.799s | | Iteration 10 | poisson | 1.0 | 36 | 26 | 6 | 10 | 0.9 | 0.3135 | 0.6013 | -7.1688 | -4.4619 | 0.155s | 15.258s | Bayesian Optimization --------------------------- Best call --> Initial point 4 Best parameters --> {'loss': 'poisson', 'learning_rate': 0.0153, 'max_iter': 264, 'max_leaf_nodes': 45, 'max_depth': 13, 'min_samples_leaf': 27, 'l2_regularization': 0.3} Best evaluation --> r2: 0.6013 neg_mean_squared_error: -4.5161 Time elapsed: 15.797s Fit --------------------------------------------- Train evaluation --> r2: 0.7301 neg_mean_squared_error: -2.845 Test evaluation --> r2: 0.5224 neg_mean_squared_error: -4.6795 Time elapsed: 2.484s Bootstrap --------------------------------------- Evaluation --> r2: 0.5064 ± 0.0163 neg_mean_squared_error: -4.8365 ± 0.16 Time elapsed: 11.562s ------------------------------------------------- Total time: 29.844s Final results ==================== >> Duration: 32.971s ------------------------------------- Linear SVM --> r2: 0.4318 ± 0.0046 neg_mean_squared_error: -5.5675 ± 0.0452 HistGBM --> r2: 0.5064 ± 0.0163 neg_mean_squared_error: -4.8365 ± 0.16 ~ !
Analyze the results¶
In [6]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[6]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.5468896602938744, -5.13219509627928] | [0.4648239922283617, -5.6414015824978] | [0.43283219743814105, -5.5574871296346195] |
hGBM | [0.6012801528863336, -4.516136280344341] | [0.7301031935292892, -2.845038359388989] | [0.5224380449509453, -4.67946947411353] |
In [7]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
In [8]:
Copied!
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")