Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor
# Import packages
import pandas as pd
from sklearn.datasets import load_breast_cancer
from atom import ATOMRegressor

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

            
                Copied!
                
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, warnings=False, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 182 (0.6%)
-------------------------------------
Train set size: 3342
Test set size: 835
-------------------------------------

In [4]:

            
                Copied!
                
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "mse"),
    n_calls=10,
    n_initial_points=4,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_mean_squared_error


Running BO for Linear-SVM...
| call             |    loss |       C |    loss |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ------- | ------- | ------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squar.. |  46.003 | squar.. |  0.4451 |  0.4451 |                -6.6842 |                     -6.6842 |  0.019s |     0.030s |
| Initial point 2  | squar.. |   0.015 | squar.. |  0.4007 |  0.4451 |                -6.4677 |                     -6.4677 |  0.016s |     0.219s |
| Initial point 3  | epsil.. |   2.232 | epsil.. |  0.4421 |  0.4451 |                -6.0536 |                     -6.0536 |  0.043s |     0.318s |
| Initial point 4  | squar.. |   0.037 | squar.. |   0.445 |  0.4451 |                -5.9243 |                     -5.9243 |  0.025s |     0.400s |
| Iteration 5      | epsil.. |   0.001 | epsil.. | -4.9948 |  0.4451 |               -60.2557 |                     -5.9243 |  0.015s |     0.694s |
| Iteration 6      | epsil.. |   100.0 | epsil.. |  0.3857 |  0.4451 |                -6.7234 |                     -5.9243 |  0.124s |     1.158s |
| Iteration 7      | epsil.. |   3.415 | epsil.. |  0.4129 |  0.4451 |                -6.6619 |                     -5.9243 |  0.051s |     1.587s |
| Iteration 8      | squar.. |   0.101 | squar.. |  0.3432 |  0.4451 |                -6.5185 |                     -5.9243 |  0.037s |     1.912s |
| Iteration 9      | squar.. |  84.528 | squar.. |  0.2782 |  0.4451 |                -6.9502 |                     -5.9243 |  0.114s |     2.364s |
| Iteration 10     | squar.. |  21.824 | squar.. |  0.4682 |  0.4682 |                -6.3762 |                     -5.9243 |  0.014s |     2.678s |

Results for Linear-SVM:         
Bayesian Optimization ---------------------------
Best call --> Iteration 10
Best parameters --> {'loss': 'squared_epsilon_insensitive', 'C': 21.824, 'dual': False}
Best evaluation --> r2: 0.4682   neg_mean_squared_error: -6.3762
Time elapsed: 2.978s
Fit ---------------------------------------------
Train evaluation --> r2: 0.46   neg_mean_squared_error: -5.6966
Test evaluation --> r2: 0.4534   neg_mean_squared_error: -5.3365
Time elapsed: 0.027s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4502 ± 0.0037   neg_mean_squared_error: -5.3678 ± 0.0357
Time elapsed: 0.105s
-------------------------------------------------
Total time: 3.111s


Running BO for HistGBM...
| call             |    loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_mean_squared_error | best_neg_mean_squared_error |    time | total_time |
| ---------------- | ------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ---------------------- | --------------------------- | ------- | ---------- |
| Initial point 1  | squar.. |          0.73 |       73 |             50 |       2.0 |               18 |               0.4 |  0.4968 |  0.4968 |                -6.0615 |                     -6.0615 |  0.079s |     0.094s |
| Initial point 2  | poisson |          0.74 |      425 |             23 |       5.0 |               19 |               0.2 |  0.3946 |  0.4968 |                -6.5334 |                     -6.0615 |  0.977s |     1.145s |
| Initial point 3  | poisson |          0.67 |      234 |             27 |       9.0 |               26 |               0.7 |  0.3889 |  0.4968 |                -6.6311 |                     -6.0615 |  1.605s |     2.820s |
| Initial point 4  | squar.. |          0.02 |      264 |             45 |       8.0 |               27 |               0.3 |  0.5295 |  0.5295 |                -5.0229 |                     -5.0229 |  1.532s |     4.421s |
| Iteration 5      | squar.. |          0.01 |      500 |             10 |      None |               10 |               1.0 |  0.5292 |  0.5295 |                -4.7317 |                     -4.7317 |  1.027s |     5.824s |
| Iteration 6      | absol.. |          0.13 |      261 |             38 |       8.0 |               17 |               0.3 |  0.5145 |  0.5295 |                -5.3136 |                     -4.7317 |  1.582s |     7.864s |
| Iteration 7      | absol.. |          0.01 |       61 |             37 |       3.0 |               27 |               0.6 |  0.2051 |  0.5295 |                -9.0205 |                     -4.7317 |  0.121s |     8.418s |
| Iteration 8      | squar.. |          0.08 |      410 |             50 |       8.0 |               30 |               0.3 |  0.5073 |  0.5295 |                -4.8901 |                     -4.7317 |  1.344s |    10.245s |
| Iteration 9      | poisson |          0.01 |       10 |             35 |       8.0 |               10 |               0.3 |  0.0998 |  0.5295 |                -8.6673 |                     -4.7317 |  0.080s |    10.786s |
| Iteration 10     | squar.. |          0.01 |      327 |             50 |       8.0 |               30 |               0.6 |  0.5562 |  0.5562 |                -5.3221 |                     -4.7317 |  2.319s |    13.692s |

Results for HistGBM:         
Bayesian Optimization ---------------------------
Best call --> Iteration 10
Best parameters --> {'loss': 'squared_error', 'learning_rate': 0.01, 'max_iter': 327, 'max_leaf_nodes': 50, 'max_depth': 8.0, 'min_samples_leaf': 30, 'l2_regularization': 0.6}
Best evaluation --> r2: 0.5562   neg_mean_squared_error: -5.3221
Time elapsed: 14.203s
Fit ---------------------------------------------
Train evaluation --> r2: 0.6807   neg_mean_squared_error: -3.3683
Test evaluation --> r2: 0.5734   neg_mean_squared_error: -4.1648
Time elapsed: 2.376s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5531 ± 0.0082   neg_mean_squared_error: -4.3628 ± 0.0799
Time elapsed: 14.526s
-------------------------------------------------
Total time: 31.107s


Final results ==================== >>
Duration: 34.218s
-------------------------------------
Linear-SVM --> r2: 0.4502 ± 0.0037   neg_mean_squared_error: -5.3678 ± 0.0357
HistGBM    --> r2: 0.5531 ± 0.0082   neg_mean_squared_error: -4.3628 ± 0.0799 !

Analyze the results¶

In [6]:

            
                Copied!
                
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["metric_bo", "metric_train", "metric_test"]]

Out[6]:

	metric_bo	metric_train	metric_test
lSVM	[0.46823718247604573, -6.376213545619027]	[0.4600429544670509, -5.696619820772378]	[0.45338714637643485, -5.33647335500683]
hGBM	[0.5561521256959545, -5.322050987899658]	[0.680731527851431, -3.3683255392919382]	[0.5734034657163947, -4.164777727877006]

In [7]:

            
                Copied!
                
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_bo(metric="r2", title="BO performance for r2")
    atom.plot_bo(metric="mse", title="BO performance for Mean Squared Error")

In [8]:

            
                Copied!
                
atom.plot_results(metric="mse")
atom.plot_results(metric="mse")