Example: Hyperparameter tuning¶
This example shows an advanced example on how to optimize your model's hyperparameters for multi-metric runs.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from optuna.distributions import IntDistribution
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from optuna.distributions import IntDistribution
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=4, verbose=2, random_state=1)
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=4, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: binary classification. Parallel processing with 4 cores. Dataset stats ==================== >> Shape: (569, 31) Memory: 138.96 kB Scaled: False Outlier values: 167 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) |
In [4]:
Copied!
# Train a MultiLayerPerceptron model on two metrics
# using a custom number of hidden layers
atom.run(
models="MLP",
metric=["f1", "ap"],
n_trials=10,
est_params={"activation": "relu"},
ht_params={
"distributions": {
"hidden_layer_1": IntDistribution(2, 4),
"hidden_layer_2": IntDistribution(10, 20),
"hidden_layer_3": IntDistribution(10, 20),
"hidden_layer_4": IntDistribution(2, 4),
}
}
)
# Train a MultiLayerPerceptron model on two metrics
# using a custom number of hidden layers
atom.run(
models="MLP",
metric=["f1", "ap"],
n_trials=10,
est_params={"activation": "relu"},
ht_params={
"distributions": {
"hidden_layer_1": IntDistribution(2, 4),
"hidden_layer_2": IntDistribution(10, 20),
"hidden_layer_3": IntDistribution(10, 20),
"hidden_layer_4": IntDistribution(2, 4),
}
}
)
Training ========================= >> Models: MLP Metric: f1, average_precision Running hyperparameter tuning for MultiLayerPerceptron... | trial | hidden_layer_1 | hidden_layer_2 | hidden_layer_3 | hidden_layer_4 | f1 | best_f1 | average_precision | best_average_precision | time_trial | time_ht | state | | ----- | -------------- | -------------- | -------------- | -------------- | ------- | ------- | ----------------- | ---------------------- | ---------- | ------- | -------- | | 0 | 3 | 17 | 10 | 2 | 0.9455 | 0.9455 | 0.9837 | 0.9837 | 0.943s | 0.943s | COMPLETE | | 1 | 2 | 11 | 12 | 3 | 0.9739 | 0.9739 | 0.9988 | 0.9988 | 0.929s | 1.872s | COMPLETE | | 2 | 3 | 15 | 14 | 4 | 0.9913 | 0.9913 | 1.0 | 1.0 | 0.935s | 2.807s | COMPLETE | | 3 | 2 | 19 | 10 | 4 | 0.9649 | 0.9913 | 0.9867 | 1.0 | 0.926s | 3.732s | COMPLETE | | 4 | 3 | 16 | 11 | 2 | 0.9655 | 0.9913 | 0.998 | 1.0 | 0.925s | 4.657s | COMPLETE | | 5 | 4 | 20 | 13 | 4 | 0.9821 | 0.9913 | 0.9994 | 1.0 | 0.940s | 5.597s | COMPLETE | | 6 | 4 | 19 | 10 | 2 | 0.9825 | 0.9913 | 0.9901 | 1.0 | 0.943s | 6.540s | COMPLETE | | 7 | 2 | 19 | 11 | 3 | 0.7703 | 0.9913 | 0.9991 | 1.0 | 0.943s | 7.483s | COMPLETE | | 8 | 4 | 15 | 17 | 2 | 0.9913 | 0.9913 | 0.9997 | 1.0 | 0.930s | 8.413s | COMPLETE | | 9 | 4 | 19 | 10 | 4 | 0.9739 | 0.9913 | 0.9813 | 1.0 | 0.915s | 9.327s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 2 Best parameters: --> hidden_layer_sizes: (3, 15, 14, 4) Best evaluation --> f1: 0.9913 average_precision: 1.0 Time elapsed: 9.327s Fit --------------------------------------------- Train evaluation --> f1: 0.993 average_precision: 0.998 Test evaluation --> f1: 0.9861 average_precision: 0.995 Time elapsed: 1.305s ------------------------------------------------- Total time: 10.633s Final results ==================== >> Total time: 10.675s ------------------------------------- MultiLayerPerceptron --> f1: 0.9861 average_precision: 0.995
In [5]:
Copied!
# For multi-metric runs, the selected best trial is the first in the Pareto front
atom.mlp.best_trial
# For multi-metric runs, the selected best trial is the first in the Pareto front
atom.mlp.best_trial
Out[5]:
FrozenTrial(number=2, state=TrialState.COMPLETE, values=[0.9913043478260869, 1.0000000000000002], datetime_start=datetime.datetime(2022, 11, 23, 17, 35, 3, 837565), datetime_complete=datetime.datetime(2022, 11, 23, 17, 35, 4, 771413), params={'hidden_layer_1': 3, 'hidden_layer_2': 15, 'hidden_layer_3': 14, 'hidden_layer_4': 4}, user_attrs={'params': {'hidden_layer_1': 3, 'hidden_layer_2': 15, 'hidden_layer_3': 14, 'hidden_layer_4': 4}, 'estimator': MLPClassifier(hidden_layer_sizes=(3, 15, 14, 4), random_state=1)}, system_attrs={'nsga2:generation': 0}, intermediate_values={}, distributions={'hidden_layer_1': IntDistribution(high=4, log=False, low=2, step=1), 'hidden_layer_2': IntDistribution(high=20, log=False, low=10, step=1), 'hidden_layer_3': IntDistribution(high=20, log=False, low=10, step=1), 'hidden_layer_4': IntDistribution(high=4, log=False, low=2, step=1)}, trial_id=2, value=None)
In [6]:
Copied!
atom.plot_pareto_front()
atom.plot_pareto_front()
In [7]:
Copied!
# If you are unhappy with the results, it's possible to conitnue the study
atom.mlp.hyperparameter_tuning(n_trials=5)
# If you are unhappy with the results, it's possible to conitnue the study
atom.mlp.hyperparameter_tuning(n_trials=5)
Running hyperparameter tuning for MultiLayerPerceptron... | trial | hidden_layer_1 | hidden_layer_2 | hidden_layer_3 | hidden_layer_4 | f1 | best_f1 | average_precision | best_average_precision | time_trial | time_ht | state | | ----- | -------------- | -------------- | -------------- | -------------- | ------- | ------- | ----------------- | ---------------------- | ---------- | ------- | -------- | | 10 | 4 | 18 | 13 | 4 | 1.0 | 1.0 | 1.0 | 1.0 | 0.977s | 10.304s | COMPLETE | | 11 | 2 | 14 | 19 | 2 | 0.9492 | 1.0 | 0.9899 | 1.0 | 0.924s | 11.228s | COMPLETE | | 12 | 2 | 11 | 10 | 4 | 0.7703 | 1.0 | 0.99 | 1.0 | 0.919s | 12.147s | COMPLETE | | 13 | 2 | 12 | 15 | 2 | 0.9643 | 1.0 | 0.9813 | 1.0 | 0.914s | 13.061s | COMPLETE | | 14 | 3 | 11 | 16 | 4 | 0.7703 | 1.0 | 0.9724 | 1.0 | 0.926s | 13.987s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 10 Best parameters: --> hidden_layer_sizes: (4, 18, 13, 4) Best evaluation --> f1: 1.0 average_precision: 1.0 Time elapsed: 13.987s
In [8]:
Copied!
# The trials attribute gives an overview of the trial results
atom.mlp.trials
# The trials attribute gives an overview of the trial results
atom.mlp.trials
Out[8]:
params | estimator | score | time_trial | time_ht | state | |
---|---|---|---|---|---|---|
trial | ||||||
0 | {'hidden_layer_sizes': (3, 17, 10, 2)} | MLPClassifier(hidden_layer_sizes=(3, 17, 10, 2... | [0.9454545454545454, 0.9837236558914353] | 0.942854 | 0.942854 | COMPLETE |
1 | {'hidden_layer_sizes': (2, 11, 12, 3)} | MLPClassifier(hidden_layer_sizes=(2, 11, 12, 3... | [0.9739130434782608, 0.9988003322156944] | 0.928844 | 1.871698 | COMPLETE |
2 | {'hidden_layer_sizes': (3, 15, 14, 4)} | MLPClassifier(hidden_layer_sizes=(3, 15, 14, 4... | [0.9913043478260869, 1.0000000000000002] | 0.934848 | 2.806546 | COMPLETE |
3 | {'hidden_layer_sizes': (2, 19, 10, 4)} | MLPClassifier(hidden_layer_sizes=(2, 19, 10, 4... | [0.9649122807017544, 0.9867431480369178] | 0.925841 | 3.732387 | COMPLETE |
4 | {'hidden_layer_sizes': (3, 16, 11, 2)} | MLPClassifier(hidden_layer_sizes=(3, 16, 11, 2... | [0.9655172413793103, 0.9980213692125051] | 0.92484 | 4.657227 | COMPLETE |
5 | {'hidden_layer_sizes': (4, 20, 13, 4)} | MLPClassifier(hidden_layer_sizes=(4, 20, 13, 4... | [0.9821428571428572, 0.999389732649834] | 0.939853 | 5.59708 | COMPLETE |
6 | {'hidden_layer_sizes': (4, 19, 10, 2)} | MLPClassifier(hidden_layer_sizes=(4, 19, 10, 2... | [0.9824561403508771, 0.990093407959159] | 0.942856 | 6.539936 | COMPLETE |
7 | {'hidden_layer_sizes': (2, 19, 11, 3)} | MLPClassifier(hidden_layer_sizes=(2, 19, 11, 3... | [0.7702702702702703, 0.9990764494418141] | 0.942856 | 7.482792 | COMPLETE |
8 | {'hidden_layer_sizes': (4, 15, 17, 2)} | MLPClassifier(hidden_layer_sizes=(4, 15, 17, 2... | [0.9913043478260869, 0.9996975196612221] | 0.929844 | 8.412636 | COMPLETE |
9 | {'hidden_layer_sizes': (4, 19, 10, 4)} | MLPClassifier(hidden_layer_sizes=(4, 19, 10, 4... | [0.9739130434782608, 0.9813127743443262] | 0.914831 | 9.327467 | COMPLETE |
10 | {'hidden_layer_sizes': (4, 18, 13, 4)} | MLPClassifier(hidden_layer_sizes=(4, 18, 13, 4... | [1.0, 1.0000000000000002] | 0.976887 | 10.304354 | COMPLETE |
11 | {'hidden_layer_sizes': (2, 14, 19, 2)} | MLPClassifier(hidden_layer_sizes=(2, 14, 19, 2... | [0.9491525423728813, 0.9899476963066745] | 0.923839 | 11.228193 | COMPLETE |
12 | {'hidden_layer_sizes': (2, 11, 10, 4)} | MLPClassifier(hidden_layer_sizes=(2, 11, 10, 4... | [0.7702702702702703, 0.9900232191286547] | 0.918835 | 12.147028 | COMPLETE |
13 | {'hidden_layer_sizes': (2, 12, 15, 2)} | MLPClassifier(hidden_layer_sizes=(2, 12, 15, 2... | [0.9642857142857142, 0.9812621686248989] | 0.91383 | 13.060858 | COMPLETE |
14 | {'hidden_layer_sizes': (3, 11, 16, 4)} | MLPClassifier(hidden_layer_sizes=(3, 11, 16, 4... | [0.7702702702702703, 0.9723670235061694] | 0.92584 | 13.986698 | COMPLETE |
In [9]:
Copied!
# Select a custom best trial...
atom.mlp.best_trial = 2
# ...and check that the best parameters are now those in the selected trial
atom.mlp.best_params
# Select a custom best trial...
atom.mlp.best_trial = 2
# ...and check that the best parameters are now those in the selected trial
atom.mlp.best_params
Out[9]:
{'hidden_layer_sizes': (3, 15, 14, 4)}
In [10]:
Copied!
# Lastly, fit the model on the complete training set
# using the new combination of hyperparameters
atom.mlp.fit()
# Lastly, fit the model on the complete training set
# using the new combination of hyperparameters
atom.mlp.fit()
Fit --------------------------------------------- Train evaluation --> f1: 0.9948 average_precision: 0.9994 Test evaluation --> f1: 0.9861 average_precision: 0.997 Time elapsed: 2.648s
Analyze the results¶
In [11]:
Copied!
atom.plot_trials()
atom.plot_trials()
In [12]:
Copied!
atom.plot_parallel_coordinate()
atom.plot_parallel_coordinate()