Example: Multi-metric runs¶

This example shows how to evaluate an atom's pipeline on multiple metrics.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

UserWarning: The pandas version installed (1.5.3) does not match the supported pandas version in Modin (1.5.2). This may cause undesired side effects!

In [2]:

                
                    Copied!
                    
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

Run the pipeline¶

In [3]:

                
                    Copied!
                    
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Train set size: 3342
Test set size: 835
-------------------------------------
Memory: 509.72 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 189 (0.6%)

In [4]:

                
                    Copied!
                    
atom.encode()
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
    models=["lsvm", "hGBM"],
    metric=("r2", "rmse"),
    n_trials=10,
    n_bootstrap=6,
)

Training ========================= >>
Models: lSVM, hGBM
Metric: r2, neg_root_mean_squared_error


Running hyperparameter tuning for LinearSVM...
| trial |                    loss |       C |    dual |      r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error | time_trial | time_ht |    state |
| ----- | ----------------------- | ------- | ------- | ------- | ------- | --------------------------- | -------------------------------- | ---------- | ------- | -------- |
| 0     | squared_epsilon_insen.. |   0.001 |    True |  0.2887 |  0.2887 |                     -2.6528 |                          -2.6528 |     0.041s |  0.041s | COMPLETE |
| 1     | squared_epsilon_insen.. |  0.0534 |   False |  0.3862 |  0.3862 |                     -2.5926 |                          -2.5926 |     0.038s |  0.079s | COMPLETE |
| 2     | squared_epsilon_insen.. |  0.0105 |    True |   0.433 |   0.433 |                     -2.4084 |                          -2.4084 |     0.041s |  0.120s | COMPLETE |
| 3     |     epsilon_insensitive |  0.6215 |    True |  0.4022 |   0.433 |                     -2.5251 |                          -2.4084 |     0.048s |  0.168s | COMPLETE |
| 4     | squared_epsilon_insen.. |  0.0369 |   False |  0.4057 |   0.433 |                     -2.5477 |                          -2.4084 |     0.039s |  0.207s | COMPLETE |
| 5     |     epsilon_insensitive |  0.0016 |    True | -1.5344 |   0.433 |                     -5.0102 |                          -2.4084 |     0.039s |  0.246s | COMPLETE |
| 6     | squared_epsilon_insen.. | 61.5811 |   False |  0.4354 |  0.4354 |                     -2.3845 |                          -2.3845 |     0.038s |  0.284s | COMPLETE |
| 7     | squared_epsilon_insen.. |  14.898 |   False |  0.4925 |  0.4925 |                     -2.2628 |                          -2.2628 |     0.039s |  0.323s | COMPLETE |
| 8     |     epsilon_insensitive |  0.0252 |    True |  0.3695 |  0.4925 |                     -2.6178 |                          -2.2628 |     0.040s |  0.363s | COMPLETE |
| 9     | squared_epsilon_insen.. |  0.0294 |    True |  0.4767 |  0.4925 |                     -2.3896 |                          -2.2628 |     0.044s |  0.407s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 7
Best parameters:
 --> loss: squared_epsilon_insensitive
 --> C: 14.898
 --> dual: False
Best evaluation --> r2: 0.4925   neg_root_mean_squared_error: -2.2628
Time elapsed: 0.407s
Fit ---------------------------------------------
Train evaluation --> r2: 0.4592   neg_root_mean_squared_error: -2.3795
Test evaluation --> r2: 0.4584   neg_root_mean_squared_error: -2.3369
Time elapsed: 0.027s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.4577 ± 0.002   neg_root_mean_squared_error: -2.3384 ± 0.0043
Time elapsed: 0.097s
-------------------------------------------------
Total time: 0.531s


Running hyperparameter tuning for HistGradientBoosting...
| trial |        loss | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization |      r2 | best_r2 | neg_root_mean_squared_error | best_neg_root_mean_squared_error | time_trial | time_ht |    state |
| ----- | ----------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | --------------------------- | -------------------------------- | ---------- | ------- | -------- |
| 0     | absolute_.. |        0.0402 |       80 |             13 |        15 |               24 |               0.9 |  0.5248 |  0.5248 |                     -2.1683 |                          -2.1683 |     0.275s |  0.275s | COMPLETE |
| 1     | squared_e.. |        0.0219 |      440 |             14 |         9 |               16 |               0.1 |  0.5673 |  0.5673 |                     -2.1767 |                          -2.1683 |     1.059s |  1.334s | COMPLETE |
| 2     | absolute_.. |         0.034 |      250 |             12 |        12 |               26 |               0.4 |  0.5174 |  0.5673 |                     -2.2218 |                          -2.1683 |     0.703s |  2.037s | COMPLETE |
| 3     | absolute_.. |        0.3174 |      370 |             46 |         6 |               10 |               0.6 |  0.5566 |  0.5673 |                     -2.1746 |                          -2.1683 |     1.291s |  3.328s | COMPLETE |
| 4     |     poisson |        0.0518 |      460 |             35 |         3 |               15 |               0.9 |  0.5691 |  0.5691 |                     -2.1695 |                          -2.1683 |     0.586s |  3.914s | COMPLETE |
| 5     |     poisson |        0.0177 |      140 |             34 |      None |               21 |               0.0 |  0.5546 |  0.5691 |                     -2.1003 |                          -2.1003 |     0.839s |  4.752s | COMPLETE |
| 6     | absolute_.. |        0.0255 |      130 |             40 |        15 |               17 |               0.6 |   0.483 |  0.5691 |                     -2.2817 |                          -2.1003 |     1.117s |  5.869s | COMPLETE |
| 7     | squared_e.. |        0.0136 |      190 |             35 |        14 |               22 |               1.0 |  0.5699 |  0.5699 |                      -2.083 |                           -2.083 |     1.174s |  7.043s | COMPLETE |
| 8     | squared_e.. |        0.1919 |      200 |             29 |         5 |               26 |               0.3 |  0.5105 |  0.5699 |                     -2.3066 |                           -2.083 |     0.440s |  7.484s | COMPLETE |
| 9     | squared_e.. |        0.4892 |      460 |             28 |        10 |               24 |               1.0 |  0.4298 |  0.5699 |                     -2.4942 |                           -2.083 |     1.545s |  9.029s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 7
Best parameters:
 --> loss: squared_error
 --> learning_rate: 0.0136
 --> max_iter: 190
 --> max_leaf_nodes: 35
 --> max_depth: 14
 --> min_samples_leaf: 22
 --> l2_regularization: 1.0
Best evaluation --> r2: 0.5699   neg_root_mean_squared_error: -2.083
Time elapsed: 9.029s
Fit ---------------------------------------------
Train evaluation --> r2: 0.6668   neg_root_mean_squared_error: -1.8677
Test evaluation --> r2: 0.5583   neg_root_mean_squared_error: -2.1105
Time elapsed: 1.076s
Bootstrap ---------------------------------------
Evaluation --> r2: 0.5388 ± 0.0083   neg_root_mean_squared_error: -2.1565 ± 0.0192
Time elapsed: 6.803s
-------------------------------------------------
Total time: 16.908s


Final results ==================== >>
Total time: 17.610s
-------------------------------------
LinearSVM            --> r2: 0.4577 ± 0.002   neg_root_mean_squared_error: -2.3384 ± 0.0043
HistGradientBoosting --> r2: 0.5388 ± 0.0083   neg_root_mean_squared_error: -2.1565 ± 0.0192 !

In [6]:

                
                    Copied!
                    
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()

Applying cross-validation...

Out[6]:

	train_r2	test_r2	train_neg_root_mean_squared_error	test_neg_root_mean_squared_error	time (s)
0	0.672482	0.534480	-1.841417	-2.214880	1.079982
1	0.669693	0.541603	-1.850140	-2.193359	1.113014
2	0.674650	0.525120	-1.860947	-2.111204	1.109009
3	0.661519	0.579041	-1.851866	-2.194239	1.065969
4	0.666829	0.558253	-1.867706	-2.110524	1.136033
mean	0.669035	0.547699	-1.854415	-2.164841	1.100801
std	0.004587	0.019055	0.009089	0.044741	0.024918

Analyze the results¶

In [7]:

                
                    Copied!
                    
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["score_ht", "score_train", "score_test"]]
# The columns in the results dataframe contain a list of
# scores, one for each metric (in the same order as called)
atom.results[["score_ht", "score_train", "score_test"]]

Out[7]:

	score_ht	score_train	score_test
lSVM	[0.4925303455788521, -2.262753922393612]	[0.4592, -2.3795]	[0.4584, -2.3369]
hGBM	[0.5699377046439738, -2.08304173753828]	[0.6668, -1.8677]	[0.5583, -2.1105]

In [8]:

                
                    Copied!
                    
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
    atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
    atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")

In [9]:

                
                    Copied!
                    
atom.plot_results(metric="r2")
atom.plot_results(metric="r2")