Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Scaled: False Categorical features: 1 (12.5%) Outlier values: 182 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835 -------------------------------------
In [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_mean_squared_error Running BO for Linear-SVM... | call | loss | C | loss | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squar.. | 46.003 | squar.. | 0.4452 | 0.4452 | -6.6832 | -6.6832 | 0.050s | 0.059s | | Initial point 2 | squar.. | 0.015 | squar.. | 0.3981 | 0.4452 | -6.4963 | -6.4963 | 0.047s | 0.292s | | Initial point 3 | epsil.. | 2.232 | epsil.. | 0.4422 | 0.4452 | -6.0518 | -6.0518 | 0.071s | 0.425s | | Initial point 4 | squar.. | 0.037 | squar.. | 0.445 | 0.4452 | -5.9249 | -5.9249 | 0.055s | 0.545s | | Iteration 5 | epsil.. | 0.001 | epsil.. | -5.0381 | 0.4452 | -60.6911 | -5.9249 | 0.045s | 0.945s | | Iteration 6 | epsil.. | 100.0 | epsil.. | 0.3566 | 0.4452 | -7.0422 | -5.9249 | 0.135s | 1.367s | | Iteration 7 | epsil.. | 3.377 | epsil.. | 0.4115 | 0.4452 | -6.6786 | -5.9249 | 0.082s | 1.784s | | Iteration 8 | squar.. | 0.096 | squar.. | 0.3427 | 0.4452 | -6.5239 | -5.9249 | 0.062s | 2.125s | | Iteration 9 | squar.. | 83.195 | squar.. | 0.2792 | 0.4452 | -6.9407 | -5.9249 | 0.140s | 2.580s | | Iteration 10 | squar.. | 22.103 | squar.. | 0.4682 | 0.4682 | -6.3761 | -5.9249 | 0.047s | 2.909s | Results for Linear-SVM: Bayesian Optimization --------------------------- Best call --> Iteration 10 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 22.103, 'dual': False} Best evaluation --> r2: 0.4682 neg_mean_squared_error: -6.3761 Time elapsed: 3.186s Fit --------------------------------------------- Train evaluation --> r2: 0.46 neg_mean_squared_error: -5.6966 Test evaluation --> r2: 0.4534 neg_mean_squared_error: -5.3365 Time elapsed: 0.016s Bootstrap --------------------------------------- Evaluation --> r2: 0.4502 ± 0.0037 neg_mean_squared_error: -5.3678 ± 0.0357 Time elapsed: 0.091s ------------------------------------------------- Total time: 3.295s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squar.. | 0.73 | 73 | 50 | 2.0 | 18 | 0.4 | 0.4968 | 0.4968 | -6.0615 | -6.0615 | 0.083s | 0.099s | | Initial point 2 | poisson | 0.74 | 425 | 23 | 5.0 | 19 | 0.2 | 0.3944 | 0.4968 | -6.5361 | -6.0615 | 0.852s | 1.036s | | Initial point 3 | poisson | 0.67 | 234 | 27 | 9.0 | 26 | 0.7 | 0.3889 | 0.4968 | -6.6311 | -6.0615 | 0.750s | 1.871s | | Initial point 4 | squar.. | 0.02 | 264 | 45 | 8.0 | 27 | 0.3 | 0.5295 | 0.5295 | -5.0227 | -5.0227 | 1.273s | 3.228s | | Iteration 5 | squar.. | 0.01 | 500 | 10 | None | 10 | 0.0 | 0.5302 | 0.5302 | -4.7219 | -4.7219 | 0.889s | 4.479s | | Iteration 6 | absol.. | 0.28 | 70 | 38 | 4.0 | 23 | 0.7 | 0.5188 | 0.5302 | -5.2664 | -4.7219 | 0.176s | 4.957s | | Iteration 7 | absol.. | 0.01 | 482 | 24 | 5.0 | 17 | 0.0 | 0.5238 | 0.5302 | -5.404 | -4.7219 | 1.639s | 6.974s | | Iteration 8 | squar.. | 0.02 | 464 | 24 | 1.0 | 26 | 0.9 | 0.4291 | 0.5302 | -5.666 | -4.7219 | 0.224s | 7.563s | | Iteration 9 | absol.. | 0.01 | 145 | 16 | 7.0 | 10 | 0.0 | 0.427 | 0.5302 | -5.5174 | -4.7219 | 0.526s | 8.476s | | Iteration 10 | absol.. | 0.66 | 214 | 30 | 2.0 | 28 | 0.9 | 0.4904 | 0.5302 | -6.11 | -4.7219 | 0.213s | 9.139s | Results for HistGBM: Bayesian Optimization --------------------------- Best call --> Iteration 5 Best parameters --> {'loss': 'squared_error', 'learning_rate': 0.01, 'max_iter': 500, 'max_leaf_nodes': 10, 'max_depth': None, 'min_samples_leaf': 10, 'l2_regularization': 0.0} Best evaluation --> r2: 0.5302 neg_mean_squared_error: -4.7219 Time elapsed: 9.504s Fit --------------------------------------------- Train evaluation --> r2: 0.629 neg_mean_squared_error: -3.9137 Test evaluation --> r2: 0.5701 neg_mean_squared_error: -4.1974 Time elapsed: 0.878s Bootstrap --------------------------------------- Evaluation --> r2: 0.5472 ± 0.0101 neg_mean_squared_error: -4.4208 ± 0.0982 Time elapsed: 5.174s ------------------------------------------------- Total time: 15.559s Final results ==================== >> Duration: 18.854s ------------------------------------- Linear-SVM --> r2: 0.4502 ± 0.0037 neg_mean_squared_error: -5.3678 ± 0.0357 HistGBM --> r2: 0.5472 ± 0.0101 neg_mean_squared_error: -4.4208 ± 0.0982 !
Analyze the results¶
In [6]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[6]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.46824925981896437, -6.376068729706791] | [0.4600429540886304, -5.696619824764766] | [0.4533871475615696, -5.336473343436593] |
hGBM | [0.5302271837566794, -4.721850726425385] | [0.6290423551503116, -3.913653298535993] | [0.5700631312607444, -4.197388753580518] |
In [7]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
In [8]:
Copied!
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")