Example: Multiclass classification¶

This example shows how to compare the performance of three models on a multiclass classification task.

Import the wine dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict wines into three groups (which cultivator it's from) using features based on the results of chemical analysis.

Load the data¶

In [1]:

Copied!

# Import packages
from sklearn.datasets import load_wine
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_wine
from atom import ATOMClassifier

In [2]:

Copied!

# Load data
X, y = load_wine(return_X_y=True, as_frame=True)

# Let's have a look
X.head()
# Load data
X, y = load_wine(return_X_y=True, as_frame=True)

# Let's have a look
X.head()

Out[2]:

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
0	14.23	1.71	2.43	15.6	127.0	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065.0
1	13.20	1.78	2.14	11.2	100.0	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050.0
2	13.16	2.36	2.67	18.6	101.0	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185.0
3	14.37	1.95	2.50	16.8	113.0	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480.0
4	13.24	2.59	2.87	21.0	118.0	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735.0

Run the pipeline¶

In [3]:

Copied!





atom = ATOMClassifier(X, y, n_jobs=1, verbose=2, random_state=1)

# Fit the pipeline with the selected models
atom.run(
    models=["LR","LDA", "RF"],
    metric="roc_auc_ovr",
    n_trials=14,
    n_bootstrap=5,
    errors="raise",
)
atom = ATOMClassifier(X, y, n_jobs=1, verbose=2, random_state=1)

# Fit the pipeline with the selected models
atom.run(
    models=["LR","LDA", "RF"],
    metric="roc_auc_ovr",
    n_trials=14,
    n_bootstrap=5,
    errors="raise",
)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Multiclass classification.

Dataset stats ==================== >>
Shape: (178, 14)
Train set size: 143
Test set size: 35
-------------------------------------
Memory: 19.36 kB
Scaled: False
Outlier values: 12 (0.6%)


Training ========================= >>
Models: LR, LDA, RF
Metric: roc_auc_ovr


Running hyperparameter tuning for LogisticRegression...
| trial | penalty |       C |  solver | max_iter | l1_ratio | roc_auc_ovr | best_roc_auc_ovr | time_trial | time_ht |    state |
| ----- | ------- | ------- | ------- | -------- | -------- | ----------- | ---------------- | ---------- | ------- | -------- |
| 0     |      l1 |  0.0054 |    saga |      480 |      0.7 |         0.5 |              0.5 |     0.033s |  0.033s | COMPLETE |
| 1     |      l1 |   0.122 |    saga |      380 |      0.7 |         1.0 |              1.0 |     0.031s |  0.064s | COMPLETE |
| 2     |      l2 |  0.0071 |     sag |      720 |      0.3 |         1.0 |              1.0 |     0.030s |  0.094s | COMPLETE |
| 3     |      l1 | 87.9641 | libli.. |      920 |      0.3 |         1.0 |              1.0 |     0.030s |  0.124s | COMPLETE |
| 4     |      l2 |  0.0114 |     sag |      630 |      0.7 |         1.0 |              1.0 |     0.029s |  0.153s | COMPLETE |
| 5     |      l2 |  0.0018 |     sag |      920 |      0.1 |         1.0 |              1.0 |     0.030s |  0.183s | COMPLETE |
| 6     |      l2 | 43.4053 |     sag |      780 |      0.3 |         1.0 |              1.0 |     0.049s |  0.232s | COMPLETE |
| 7     |      l2 |  2.0759 | libli.. |      470 |      0.2 |         1.0 |              1.0 |     0.028s |  0.260s | COMPLETE |
| 8     |    None |   0.043 |     sag |      110 |      1.0 |         1.0 |              1.0 |     0.030s |  0.290s | COMPLETE |
| 9     |      l1 | 46.0233 |    saga |      740 |      0.1 |         1.0 |              1.0 |     0.057s |  0.347s | COMPLETE |
| 10    |      l2 |  0.4741 |   lbfgs |      280 |      1.0 |         1.0 |              1.0 |     0.044s |  0.391s | COMPLETE |
| 11    |      l2 |  0.0765 | newto.. |      370 |      0.5 |         1.0 |              1.0 |     0.044s |  0.435s | COMPLETE |
| 12    | elast.. |  0.5609 |    saga |      640 |      0.6 |         1.0 |              1.0 |     0.044s |  0.479s | COMPLETE |
| 13    |    None |  0.0481 | newto.. |      240 |      0.5 |         1.0 |              1.0 |     0.043s |  0.522s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 1
Best parameters:
 --> penalty: l1
 --> C: 0.122
 --> solver: saga
 --> max_iter: 380
 --> l1_ratio: 0.7
Best evaluation --> roc_auc_ovr: 1.0
Time elapsed: 0.522s
Fit ---------------------------------------------
Train evaluation --> roc_auc_ovr: 0.9995
Test evaluation --> roc_auc_ovr: 0.9989
Time elapsed: 0.152s
Bootstrap ---------------------------------------
Evaluation --> roc_auc_ovr: 0.9991 ± 0.0009
Time elapsed: 0.142s
-------------------------------------------------
Time: 0.816s


Running hyperparameter tuning for LinearDiscriminantAnalysis...
| trial |  solver | shrinkage | roc_auc_ovr | best_roc_auc_ovr | time_trial | time_ht |    state |
| ----- | ------- | --------- | ----------- | ---------------- | ---------- | ------- | -------- |
| 0     |    lsqr |       0.9 |      0.9221 |           0.9221 |     0.018s |  0.018s | COMPLETE |
| 1     |   eigen |       1.0 |      0.9221 |           0.9221 |     0.012s |  0.030s | COMPLETE |
| 2     |   eigen |       1.0 |      0.9221 |           0.9221 |     0.000s |  0.030s | COMPLETE |
| 3     |    lsqr |       0.7 |      0.9241 |           0.9241 |     0.009s |  0.039s | COMPLETE |
| 4     |   eigen |       0.7 |      0.9241 |           0.9241 |     0.010s |  0.049s | COMPLETE |
| 5     |    lsqr |      auto |         1.0 |              1.0 |     0.012s |  0.061s | COMPLETE |
| 6     |   eigen |       1.0 |      0.9221 |              1.0 |     0.001s |  0.062s | COMPLETE |
| 7     |    lsqr |       1.0 |      0.9221 |              1.0 |     0.009s |  0.071s | COMPLETE |
| 8     |     svd |      None |         1.0 |              1.0 |     0.008s |  0.079s | COMPLETE |
| 9     |     svd |      None |         1.0 |              1.0 |     0.000s |  0.079s | COMPLETE |
| 10    |    lsqr |      auto |         1.0 |              1.0 |     0.001s |  0.080s | COMPLETE |
| 11    |     svd |      None |         1.0 |              1.0 |     0.002s |  0.082s | COMPLETE |
| 12    |     svd |      None |         1.0 |              1.0 |     0.001s |  0.083s | COMPLETE |
| 13    |     svd |      None |         1.0 |              1.0 |     0.001s |  0.084s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 5
Best parameters:
 --> solver: lsqr
 --> shrinkage: auto
Best evaluation --> roc_auc_ovr: 1.0
Time elapsed: 0.084s
Fit ---------------------------------------------
Train evaluation --> roc_auc_ovr: 1.0
Test evaluation --> roc_auc_ovr: 1.0
Time elapsed: 0.039s
Bootstrap ---------------------------------------
Evaluation --> roc_auc_ovr: 0.9998 ± 0.0005
Time elapsed: 0.065s
-------------------------------------------------
Time: 0.188s


Running hyperparameter tuning for RandomForest...
| trial | n_estimators | criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | max_samples | ccp_alpha | roc_auc_ovr | best_roc_auc_ovr | time_trial | time_ht |    state |
| ----- | ------------ | --------- | --------- | ----------------- | ---------------- | ------------ | --------- | ----------- | --------- | ----------- | ---------------- | ---------- | ------- | -------- |
| 0     |          210 |      gini |        10 |                17 |               20 |          0.5 |     False |        None |       0.0 |      0.9803 |           0.9803 |     0.167s |  0.167s | COMPLETE |
| 1     |          380 |      gini |         4 |                15 |                3 |          0.9 |     False |        None |      0.01 |      0.9757 |           0.9803 |     0.393s |  0.560s | COMPLETE |
| 2     |          380 |   entropy |         6 |                 2 |               13 |          0.9 |     False |        None |      0.03 |      0.9655 |           0.9803 |     0.390s |  0.950s | COMPLETE |
| 3     |          470 |      gini |        11 |                 9 |               18 |          nan |      True |         0.6 |     0.025 |      0.9944 |           0.9944 |     0.481s |  1.432s | COMPLETE |
| 4     |          100 |   entropy |        12 |                14 |                6 |          0.9 |     False |         nan |     0.035 |      0.9916 |           0.9944 |     0.111s |  1.543s | COMPLETE |
| 5     |          470 |   entropy |        13 |                11 |                1 |          nan |      True |         0.6 |      0.01 |      0.9949 |           0.9949 |     0.550s |  2.092s | COMPLETE |
| 6     |          250 |      gini |        14 |                13 |               17 |          0.7 |      True |         nan |      0.02 |      0.9949 |           0.9949 |     0.268s |  2.361s | COMPLETE |
| 7     |          220 |      gini |         5 |                10 |                7 |          0.5 |      True |         0.9 |     0.035 |      0.9949 |           0.9949 |     0.251s |  2.612s | COMPLETE |
| 8     |          130 |   entropy |         4 |                 6 |               11 |          0.9 |     False |         nan |      0.03 |      0.9693 |           0.9949 |     0.139s |  2.751s | COMPLETE |
| 9     |          370 |      gini |        12 |                 2 |                4 |          0.5 |     False |         nan |      0.02 |      0.9949 |           0.9949 |     0.317s |  3.068s | COMPLETE |
| 10    |          500 |   entropy |        13 |                20 |                1 |          0.8 |      True |         0.6 |      0.01 |      0.9932 |           0.9949 |     0.608s |  3.676s | COMPLETE |
| 11    |           20 |   entropy |        14 |                12 |               16 |          0.7 |      True |         0.5 |      0.01 |      0.9981 |           0.9981 |     0.048s |  3.724s | COMPLETE |
| 12    |           10 |   entropy |         9 |                 7 |               15 |          0.6 |      True |         0.5 |      0.01 |      0.9847 |           0.9981 |     0.036s |  3.760s | COMPLETE |
| 13    |           30 |   entropy |        16 |                11 |                9 |         log2 |      True |         0.7 |       0.0 |      0.9981 |           0.9981 |     0.065s |  3.825s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 11
Best parameters:
 --> n_estimators: 20
 --> criterion: entropy
 --> max_depth: 14
 --> min_samples_split: 12
 --> min_samples_leaf: 16
 --> max_features: 0.7
 --> bootstrap: True
 --> max_samples: 0.5
 --> ccp_alpha: 0.01
Best evaluation --> roc_auc_ovr: 0.9981
Time elapsed: 3.825s
Fit ---------------------------------------------
Train evaluation --> roc_auc_ovr: 0.999
Test evaluation --> roc_auc_ovr: 0.9986
Time elapsed: 0.055s
Bootstrap ---------------------------------------
Evaluation --> roc_auc_ovr: 0.9885 ± 0.0065
Time elapsed: 0.139s
-------------------------------------------------
Time: 4.019s


Final results ==================== >>
Total time: 8.389s
-------------------------------------
LogisticRegression         --> roc_auc_ovr: 0.9991 ± 0.0009
LinearDiscriminantAnalysis --> roc_auc_ovr: 0.9998 ± 0.0005 !
RandomForest               --> roc_auc_ovr: 0.9885 ± 0.0065

Analyze the results¶

In [4]:

Copied!

atom.results
atom.results

Out[4]:

	roc_auc_ovr_ht	time_ht	roc_auc_ovr_train	roc_auc_ovr_test	time_fit	roc_auc_ovr_bootstrap	time_bootstrap	time
LR	1.000000	0.522480	0.998700	0.998900	0.152041	0.999093	0.141717	0.816238
LDA	1.000000	0.084077	1.000000	0.998900	0.039036	0.999773	0.065059	0.188172
RF	0.998148	3.824844	0.999000	0.998600	0.055050	0.988525	0.139127	4.019021

In [5]:

Copied!

# Show the score for some different metrics
atom.evaluate(["precision_macro", "recall_macro", "jaccard_weighted"])
# Show the score for some different metrics
atom.evaluate(["precision_macro", "recall_macro", "jaccard_weighted"])

Out[5]:

	precision_macro	recall_macro	jaccard_weighted
LR	0.939400	0.952400	0.896100
LDA	0.966700	0.976200	0.945700
RF	0.911700	0.915300	0.842200

In [6]:

Copied!

# Some plots allow you to choose the target class to look at
atom.rf.plot_probabilities(rows="train", target=0)
# Some plots allow you to choose the target class to look at
atom.rf.plot_probabilities(rows="train", target=0)

In [7]:

Copied!

atom.lda.plot_shap_heatmap(target=2, show=7)
atom.lda.plot_shap_heatmap(target=2, show=7)

No description has been provided for this image