Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [9]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [10]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[10]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [11]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Scaled: False Categorical features: 1 (12.5%) Outlier values: 182 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835
In [12]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [13]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_mean_squared_error Running BO for Linear-SVM... | call | loss | C | dual | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------------------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squared_epsilon_insen.. | 46.0031 | True | 0.4003 | 0.4003 | -7.2241 | -7.2241 | 0.109s | 0.125s | | Initial point 2 | squared_epsilon_insen.. | 0.0152 | True | 0.398 | 0.4003 | -6.4967 | -6.4967 | 0.031s | 0.313s | | Initial point 3 | epsilon_insensitive | 2.2322 | True | 0.4425 | 0.4425 | -6.0495 | -6.0495 | 0.094s | 0.484s | | Initial point 4 | squared_epsilon_insen.. | 0.0368 | False | 0.445 | 0.445 | -5.9247 | -5.9247 | 0.031s | 0.594s | | Iteration 5 | squared_epsilon_insen.. | 99.8148 | False | 0.4315 | 0.445 | -5.7147 | -5.7147 | 0.031s | 0.875s | | Iteration 6 | epsilon_insensitive | 0.001 | True | -4.8849 | 0.445 | -64.4077 | -5.7147 | 0.031s | 1.250s | | Iteration 7 | epsilon_insensitive | 3.3191 | True | 0.4124 | 0.445 | -6.6678 | -5.7147 | 0.063s | 1.875s | | Iteration 8 | squared_epsilon_insen.. | 0.0718 | False | 0.3431 | 0.445 | -6.5193 | -5.7147 | 0.031s | 2.235s | | Iteration 9 | squared_epsilon_insen.. | 0.0278 | False | 0.4614 | 0.4614 | -5.1863 | -5.1863 | 0.047s | 2.594s | | Iteration 10 | epsilon_insensitive | 100.0 | True | 0.3695 | 0.4614 | -7.5606 | -5.1863 | 0.156s | 2.985s | Bayesian Optimization --------------------------- Best call --> Iteration 9 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 0.0278, 'dual': False} Best evaluation --> r2: 0.4614 neg_mean_squared_error: -5.1863 Time elapsed: 3.349s Fit --------------------------------------------- Train evaluation --> r2: 0.4595 neg_mean_squared_error: -5.7024 Test evaluation --> r2: 0.453 neg_mean_squared_error: -5.3403 Time elapsed: 0.016s Bootstrap --------------------------------------- Evaluation --> r2: 0.4498 ± 0.0036 neg_mean_squared_error: -5.371 ± 0.0352 Time elapsed: 0.094s ------------------------------------------------- Total time: 3.459s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | poisson | 0.733 | 73 | 50 | 2 | 18 | 0.4 | 0.5283 | 0.5283 | -5.6814 | -5.6814 | 0.078s | 0.094s | | Initial point 2 | absolute_.. | 0.7432 | 425 | 23 | 5 | 19 | 0.2 | 0.4851 | 0.5283 | -5.557 | -5.557 | 1.605s | 1.792s | | Initial point 3 | absolute_.. | 0.6729 | 234 | 27 | 9 | 26 | 0.7 | 0.4877 | 0.5283 | -5.5583 | -5.557 | 1.625s | 3.527s | | Initial point 4 | poisson | 0.0153 | 264 | 45 | 8 | 27 | 0.3 | 0.5336 | 0.5336 | -4.9791 | -4.9791 | 2.411s | 6.047s | | Iteration 5 | squared_e.. | 0.0719 | 410 | 34 | 6 | 29 | 0.1 | 0.5085 | 0.5336 | -4.9404 | -4.9404 | 1.359s | 7.704s | | Iteration 6 | poisson | 0.0102 | 467 | 11 | 1 | 21 | 0.9 | 0.4046 | 0.5336 | -6.5166 | -4.9404 | 0.203s | 8.219s | | Iteration 7 | poisson | 1.0 | 10 | 50 | 10 | 10 | 0.0 | 0.3163 | 0.5336 | -7.7589 | -4.9404 | 0.094s | 8.711s | | Iteration 8 | squared_e.. | 0.01 | 61 | 13 | 10 | 16 | 0.2 | 0.3426 | 0.5336 | -6.525 | -4.9404 | 0.156s | 9.273s | | Iteration 9 | poisson | 0.01 | 289 | 50 | 7 | 28 | 0.0 | 0.5531 | 0.5531 | -4.3032 | -4.3032 | 2.141s | 11.836s | | Iteration 10 | squared_e.. | 0.01 | 10 | 10 | None | 30 | 0.0 | 0.0745 | 0.5531 | -11.0979 | -4.3032 | 0.047s | 12.305s | Bayesian Optimization --------------------------- Best call --> Iteration 9 Best parameters --> {'loss': 'poisson', 'learning_rate': 0.01, 'max_iter': 289, 'max_leaf_nodes': 50, 'max_depth': 7, 'min_samples_leaf': 28, 'l2_regularization': 0.0} Best evaluation --> r2: 0.5531 neg_mean_squared_error: -4.3032 Time elapsed: 12.852s Fit --------------------------------------------- Train evaluation --> r2: 0.6662 neg_mean_squared_error: -3.5218 Test evaluation --> r2: 0.5726 neg_mean_squared_error: -4.1724 Time elapsed: 2.089s Bootstrap --------------------------------------- Evaluation --> r2: 0.5538 ± 0.0083 neg_mean_squared_error: -4.3561 ± 0.0806 Time elapsed: 10.323s ------------------------------------------------- Total time: 25.264s Final results ==================== >> Duration: 28.722s ------------------------------------- Linear-SVM --> r2: 0.4498 ± 0.0036 neg_mean_squared_error: -5.371 ± 0.0352 HistGBM --> r2: 0.5538 ± 0.0083 neg_mean_squared_error: -4.3561 ± 0.0806 !
Analyze the results¶
In [14]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[14]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.4613660476456929, -5.186254956043416] | [0.4594925871132435, -5.702426270752693] | [0.4529905982859944, -5.340344775710873] |
hGBM | [0.553083077887542, -4.3031544746009] | [0.6661886024220871, -3.521755368457413] | [0.5726269450224242, -4.172358746079403] |
In [15]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
In [16]:
Copied!
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")