Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 182 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835
-------------------------------------

In [4]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_mean_squared_error



Running BO for Linear-SVM...
| call             |    loss |       C |    loss |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squar.. |  46.003 | squar.. |  0.4452 |  0.4452 |                -6.6832 |                     -6.6832 |  0.050s |     0.059s |
| Initial point 2  | squar.. |   0.015 | squar.. |  0.3981 |  0.4452 |                -6.4963 |                     -6.4963 |  0.047s |     0.292s |
| Initial point 3  | epsil.. |   2.232 | epsil.. |  0.4422 |  0.4452 |                -6.0518 |                     -6.0518 |  0.071s |     0.425s |
| Initial point 4  | squar.. |   0.037 | squar.. |   0.445 |  0.4452 |                -5.9249 |                     -5.9249 |  0.055s |     0.545s |
| Iteration 5      | epsil.. |   0.001 | epsil.. | -5.0381 |  0.4452 |               -60.6911 |                     -5.9249 |  0.045s |     0.945s |
| Iteration 6      | epsil.. |   100.0 | epsil.. |  0.3566 |  0.4452 |                -7.0422 |                     -5.9249 |  0.135s |     1.367s |
| Iteration 7      | epsil.. |   3.377 | epsil.. |  0.4115 |  0.4452 |                -6.6786 |                     -5.9249 |  0.082s |     1.784s |
| Iteration 8      | squar.. |   0.096 | squar.. |  0.3427 |  0.4452 |                -6.5239 |                     -5.9249 |  0.062s |     2.125s |
| Iteration 9      | squar.. |  83.195 | squar.. |  0.2792 |  0.4452 |                -6.9407 |                     -5.9249 |  0.140s |     2.580s |
| Iteration 10     | squar.. |  22.103 | squar.. |  0.4682 |  0.4682 |                -6.3761 |                     -5.9249 |  0.047s |     2.909s |

Results for Linear-SVM:         
Bayesian Optimization ---------------------------
Best call --> Iteration 10
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 22.103, 'dual': False}
Best evaluation --> r2: 0.4682   neg_mean_squared_error: -6.3761
Time elapsed: 3.186s
Fit ---------------------------------------------
Train evaluation --> r2: 0.46   neg_mean_squared_error: -5.6966
Test evaluation --> r2: 0.4534   neg_mean_squared_error: -5.3365
Time elapsed: 0.016s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4502 ± 0.0037   neg_mean_squared_error: -5.3678 ± 0.0357
Time elapsed: 0.091s
-------------------------------------------------
Total time: 3.295s




Running BO for HistGBM...
| call             |    loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squar.. |          0.73 |       73 |             50 |       2.0 |               18 |               0.4 |  0.4968 |  0.4968 |                -6.0615 |                     -6.0615 |  0.083s |     0.099s |
| Initial point 2  | poisson |          0.74 |      425 |             23 |       5.0 |               19 |               0.2 |  0.3944 |  0.4968 |                -6.5361 |                     -6.0615 |  0.852s |     1.036s |
| Initial point 3  | poisson |          0.67 |      234 |             27 |       9.0 |               26 |               0.7 |  0.3889 |  0.4968 |                -6.6311 |                     -6.0615 |  0.750s |     1.871s |
| Initial point 4  | squar.. |          0.02 |      264 |             45 |       8.0 |               27 |               0.3 |  0.5295 |  0.5295 |                -5.0227 |                     -5.0227 |  1.273s |     3.228s |
| Iteration 5      | squar.. |          0.01 |      500 |             10 |      None |               10 |               0.0 |  0.5302 |  0.5302 |                -4.7219 |                     -4.7219 |  0.889s |     4.479s |
| Iteration 6      | absol.. |          0.28 |       70 |             38 |       4.0 |               23 |               0.7 |  0.5188 |  0.5302 |                -5.2664 |                     -4.7219 |  0.176s |     4.957s |
| Iteration 7      | absol.. |          0.01 |      482 |             24 |       5.0 |               17 |               0.0 |  0.5238 |  0.5302 |                 -5.404 |                     -4.7219 |  1.639s |     6.974s |
| Iteration 8      | squar.. |          0.02 |      464 |             24 |       1.0 |               26 |               0.9 |  0.4291 |  0.5302 |                 -5.666 |                     -4.7219 |  0.224s |     7.563s |
| Iteration 9      | absol.. |          0.01 |      145 |             16 |       7.0 |               10 |               0.0 |   0.427 |  0.5302 |                -5.5174 |                     -4.7219 |  0.526s |     8.476s |
| Iteration 10     | absol.. |          0.66 |      214 |             30 |       2.0 |               28 |               0.9 |  0.4904 |  0.5302 |                  -6.11 |                     -4.7219 |  0.213s |     9.139s |

Results for HistGBM:         
Bayesian Optimization ---------------------------
Best call --> Iteration 5
Best parameters --> {'loss': 'squared_error', 'learning_rate': 0.01, 'max_iter': 500, 'max_leaf_nodes': 10, 'max_depth': None, 'min_samples_leaf': 10, 'l2_regularization': 0.0}
Best evaluation --> r2: 0.5302   neg_mean_squared_error: -4.7219
Time elapsed: 9.504s
Fit ---------------------------------------------
Train evaluation --> r2: 0.629   neg_mean_squared_error: -3.9137
Test evaluation --> r2: 0.5701   neg_mean_squared_error: -4.1974
Time elapsed: 0.878s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5472 ± 0.0101   neg_mean_squared_error: -4.4208 ± 0.0982
Time elapsed: 5.174s
-------------------------------------------------
Total time: 15.559s


Final results ==================== >>
Duration: 18.854s
-------------------------------------
Linear-SVM --> r2: 0.4502 ± 0.0037   neg_mean_squared_error: -5.3678 ± 0.0357
HistGBM    --> r2: 0.5472 ± 0.0101   neg_mean_squared_error: -4.4208 ± 0.0982 !

Analyze the results¶

In [6]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[6]:

	metric_bo	metric_train	metric_test
lSVM	[0.46824925981896437, -6.376068729706791]	[0.4600429540886304, -5.696619824764766]	[0.4533871475615696, -5.336473343436593]
hGBM	[0.5302271837566794, -4.721850726425385]	[0.6290423551503116, -3.913653298535993]	[0.5700631312607444, -4.197388753580518]

In [7]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")

In [8]:

            
                Copied!
                
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")