Example: Multilabel classification¶
This example shows how to use ATOM to solve a multilabel classification problem.
The data used is a synthetic dataset created using sklearn's make_multilabel_classification function.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
In [2]:
Copied!
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
Run the pipeline¶
In [3]:
Copied!
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: multilabel classification. Dataset stats ==================== >> Shape: (300, 23) Train set size: 240 Test set size: 60 ------------------------------------- Memory: 51.73 kB Scaled: False Outlier values: 35 (0.6%)
In [4]:
Copied!
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]
Out[4]:
acronym | model | native_multioutput | |
---|---|---|---|
0 | AdaB | AdaBoost | False |
1 | Bag | Bagging | False |
2 | BNB | BernoulliNB | False |
3 | CatB | CatBoost | False |
4 | CatNB | CategoricalNB | False |
5 | CNB | ComplementNB | False |
6 | Tree | DecisionTree | True |
7 | Dummy | Dummy | False |
8 | ETree | ExtraTree | True |
9 | ET | ExtraTrees | True |
10 | GNB | GaussianNB | False |
11 | GP | GaussianProcess | False |
12 | GBM | GradientBoosting | False |
13 | hGBM | HistGradientBoosting | False |
14 | KNN | KNearestNeighbors | True |
15 | LGB | LightGBM | False |
16 | LDA | LinearDiscriminantAnalysis | False |
17 | lSVM | LinearSVM | False |
18 | LR | LogisticRegression | False |
19 | MLP | MultiLayerPerceptron | False |
20 | MNB | MultinomialNB | False |
21 | PA | PassiveAggressive | False |
22 | Perc | Perceptron | False |
23 | QDA | QuadraticDiscriminantAnalysis | False |
24 | RNN | RadiusNearestNeighbors | True |
25 | RF | RandomForest | True |
26 | Ridge | Ridge | False |
27 | SGD | StochasticGradientDescent | False |
28 | SVM | SupportVectorMachine | False |
29 | XGB | XGBoost | False |
In [5]:
Copied!
atom.run(models=["LDA", "RF"], metric="recall_weighted")
atom.run(models=["LDA", "RF"], metric="recall_weighted")
Training ========================= >> Models: LDA, RF Metric: recall_weighted Results for LinearDiscriminantAnalysis: Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.8912 Test evaluation --> recall_weighted: 0.899 Time elapsed: 0.078s ------------------------------------------------- Total time: 0.078s Results for RandomForest: Fit --------------------------------------------- Train evaluation --> recall_weighted: 1.0 Test evaluation --> recall_weighted: 0.9091 Time elapsed: 0.619s ------------------------------------------------- Total time: 0.619s Final results ==================== >> Total time: 0.701s ------------------------------------- LinearDiscriminantAnalysis --> recall_weighted: 0.899 RandomForest --> recall_weighted: 0.9091 !
In [6]:
Copied!
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
Estimator for LDA is: ClassifierChain(base_estimator=LinearDiscriminantAnalysis(), random_state=1) Estimator for RF is: RandomForestClassifier(n_jobs=1, random_state=1)
Some models, such as MLP, have native support for multilabel, but not for multiclass-multioutput tasks. For that reason, their native_multioutput
tag is False, but those models don't necessarily need a multioutout meta-estimator. In such cases, use atom's multioutput
attribute to tell atom not to use any multioutput wrapper. See here an overview of sklearn classifiers and which tasks they support.
In [7]:
Copied!
atom.multioutput = None
atom.multioutput = None
In [8]:
Copied!
atom.run("MLP")
atom.run("MLP")
Training ========================= >> Models: MLP Metric: recall_weighted Results for MultiLayerPerceptron: Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.9689 Test evaluation --> recall_weighted: 0.9192 Time elapsed: 6.430s ------------------------------------------------- Total time: 6.430s Final results ==================== >> Total time: 6.438s ------------------------------------- MultiLayerPerceptron --> recall_weighted: 0.9192
In [9]:
Copied!
print(f"Estimator for MLP is: {atom.mlp.estimator}")
print(f"Estimator for MLP is: {atom.mlp.estimator}")
Estimator for MLP is: MLPClassifier(random_state=1)
Analyze the results¶
In [10]:
Copied!
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")
Best threshold per target column: [0.72, 0.75, 0.56]
In [11]:
Copied!
atom.rf.evaluate(threshold=thresholds)
atom.rf.evaluate(threshold=thresholds)
Out[11]:
accuracy 0.5167 average_precision 0.6607 f1_weighted 0.6928 jaccard_weighted 0.5632 precision_weighted 0.9315 recall_weighted 0.5960 roc_auc 0.6873 Name: RF, dtype: float64
In [12]:
Copied!
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
In [13]:
Copied!
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))
In [14]:
Copied!
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)