Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
In [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Scaled: False Categorical features: 1 (12.5%) Outlier values: 182 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835 -------------------------------------
In [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "mse"),
n_calls=10,
n_initial_points=4,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, neg_mean_squared_error Running BO for Linear-SVM... | call | loss | C | loss | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squar.. | 46.003 | squar.. | 0.4451 | 0.4451 | -6.6842 | -6.6842 | 0.019s | 0.030s | | Initial point 2 | squar.. | 0.015 | squar.. | 0.4007 | 0.4451 | -6.4677 | -6.4677 | 0.016s | 0.219s | | Initial point 3 | epsil.. | 2.232 | epsil.. | 0.4421 | 0.4451 | -6.0536 | -6.0536 | 0.043s | 0.318s | | Initial point 4 | squar.. | 0.037 | squar.. | 0.445 | 0.4451 | -5.9243 | -5.9243 | 0.025s | 0.400s | | Iteration 5 | epsil.. | 0.001 | epsil.. | -4.9948 | 0.4451 | -60.2557 | -5.9243 | 0.015s | 0.694s | | Iteration 6 | epsil.. | 100.0 | epsil.. | 0.3857 | 0.4451 | -6.7234 | -5.9243 | 0.124s | 1.158s | | Iteration 7 | epsil.. | 3.415 | epsil.. | 0.4129 | 0.4451 | -6.6619 | -5.9243 | 0.051s | 1.587s | | Iteration 8 | squar.. | 0.101 | squar.. | 0.3432 | 0.4451 | -6.5185 | -5.9243 | 0.037s | 1.912s | | Iteration 9 | squar.. | 84.528 | squar.. | 0.2782 | 0.4451 | -6.9502 | -5.9243 | 0.114s | 2.364s | | Iteration 10 | squar.. | 21.824 | squar.. | 0.4682 | 0.4682 | -6.3762 | -5.9243 | 0.014s | 2.678s | Results for Linear-SVM: Bayesian Optimization --------------------------- Best call --> Iteration 10 Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 21.824, 'dual': False} Best evaluation --> r2: 0.4682 neg_mean_squared_error: -6.3762 Time elapsed: 2.978s Fit --------------------------------------------- Train evaluation --> r2: 0.46 neg_mean_squared_error: -5.6966 Test evaluation --> r2: 0.4534 neg_mean_squared_error: -5.3365 Time elapsed: 0.027s Bootstrap --------------------------------------- Evaluation --> r2: 0.4502 ± 0.0037 neg_mean_squared_error: -5.3678 ± 0.0357 Time elapsed: 0.105s ------------------------------------------------- Total time: 3.111s Running BO for HistGBM... | call | loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | squar.. | 0.73 | 73 | 50 | 2.0 | 18 | 0.4 | 0.4968 | 0.4968 | -6.0615 | -6.0615 | 0.079s | 0.094s | | Initial point 2 | poisson | 0.74 | 425 | 23 | 5.0 | 19 | 0.2 | 0.3946 | 0.4968 | -6.5334 | -6.0615 | 0.977s | 1.145s | | Initial point 3 | poisson | 0.67 | 234 | 27 | 9.0 | 26 | 0.7 | 0.3889 | 0.4968 | -6.6311 | -6.0615 | 1.605s | 2.820s | | Initial point 4 | squar.. | 0.02 | 264 | 45 | 8.0 | 27 | 0.3 | 0.5295 | 0.5295 | -5.0229 | -5.0229 | 1.532s | 4.421s | | Iteration 5 | squar.. | 0.01 | 500 | 10 | None | 10 | 1.0 | 0.5292 | 0.5295 | -4.7317 | -4.7317 | 1.027s | 5.824s | | Iteration 6 | absol.. | 0.13 | 261 | 38 | 8.0 | 17 | 0.3 | 0.5145 | 0.5295 | -5.3136 | -4.7317 | 1.582s | 7.864s | | Iteration 7 | absol.. | 0.01 | 61 | 37 | 3.0 | 27 | 0.6 | 0.2051 | 0.5295 | -9.0205 | -4.7317 | 0.121s | 8.418s | | Iteration 8 | squar.. | 0.08 | 410 | 50 | 8.0 | 30 | 0.3 | 0.5073 | 0.5295 | -4.8901 | -4.7317 | 1.344s | 10.245s | | Iteration 9 | poisson | 0.01 | 10 | 35 | 8.0 | 10 | 0.3 | 0.0998 | 0.5295 | -8.6673 | -4.7317 | 0.080s | 10.786s | | Iteration 10 | squar.. | 0.01 | 327 | 50 | 8.0 | 30 | 0.6 | 0.5562 | 0.5562 | -5.3221 | -4.7317 | 2.319s | 13.692s | Results for HistGBM: Bayesian Optimization --------------------------- Best call --> Iteration 10 Best parameters --> {'loss': 'squared_error', 'learning_rate': 0.01, 'max_iter': 327, 'max_leaf_nodes': 50, 'max_depth': 8.0, 'min_samples_leaf': 30, 'l2_regularization': 0.6} Best evaluation --> r2: 0.5562 neg_mean_squared_error: -5.3221 Time elapsed: 14.203s Fit --------------------------------------------- Train evaluation --> r2: 0.6807 neg_mean_squared_error: -3.3683 Test evaluation --> r2: 0.5734 neg_mean_squared_error: -4.1648 Time elapsed: 2.376s Bootstrap --------------------------------------- Evaluation --> r2: 0.5531 ± 0.0082 neg_mean_squared_error: -4.3628 ± 0.0799 Time elapsed: 14.526s ------------------------------------------------- Total time: 31.107s Final results ==================== >> Duration: 34.218s ------------------------------------- Linear-SVM --> r2: 0.4502 ± 0.0037 neg_mean_squared_error: -5.3678 ± 0.0357 HistGBM --> r2: 0.5531 ± 0.0082 neg_mean_squared_error: -4.3628 ± 0.0799 !
Analyze the results¶
In [6]:
Copied!
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
Out[6]:
metric_bo | metric_train | metric_test | |
---|---|---|---|
lSVM | [0.46823718247604573, -6.376213545619027] | [0.4600429544670509, -5.696619820772378] | [0.45338714637643485, -5.33647335500683] |
hGBM | [0.5561521256959545, -5.322050987899658] | [0.680731527851431, -3.3683255392919382] | [0.5734034657163947, -4.164777727877006] |
In [7]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_bo(metric="r2", title="BO performance for r2")
atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
In [8]:
Copied!
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")