Example: Multilabel classification¶
This example shows how to use ATOM to solve a multilabel classification problem.
The data used is a synthetic dataset created using sklearn's make_multilabel_classification function.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
In [2]:
Copied!
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
Run the pipeline¶
In [3]:
Copied!
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: multilabel classification. Dataset stats ==================== >> Shape: (300, 23) Train set size: 240 Test set size: 60 ------------------------------------- Memory: 51.73 kB Scaled: False Outlier values: 35 (0.6%)
In [4]:
Copied!
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]
Out[4]:
| acronym | model | native_multioutput | |
|---|---|---|---|
| 0 | AdaB | AdaBoost | False |
| 1 | Bag | Bagging | False |
| 2 | BNB | BernoulliNB | False |
| 3 | CatB | CatBoost | False |
| 4 | CatNB | CategoricalNB | False |
| 5 | CNB | ComplementNB | False |
| 6 | Tree | DecisionTree | True |
| 7 | Dummy | Dummy | False |
| 8 | ETree | ExtraTree | True |
| 9 | ET | ExtraTrees | True |
| 10 | GNB | GaussianNB | False |
| 11 | GP | GaussianProcess | False |
| 12 | GBM | GradientBoosting | False |
| 13 | hGBM | HistGradientBoosting | False |
| 14 | KNN | KNearestNeighbors | True |
| 15 | LGB | LightGBM | False |
| 16 | LDA | LinearDiscriminantAnalysis | False |
| 17 | lSVM | LinearSVM | False |
| 18 | LR | LogisticRegression | False |
| 19 | MLP | MultiLayerPerceptron | False |
| 20 | MNB | MultinomialNB | False |
| 21 | PA | PassiveAggressive | False |
| 22 | Perc | Perceptron | False |
| 23 | QDA | QuadraticDiscriminantAnalysis | False |
| 24 | RNN | RadiusNearestNeighbors | True |
| 25 | RF | RandomForest | True |
| 26 | Ridge | Ridge | False |
| 27 | SGD | StochasticGradientDescent | False |
| 28 | SVM | SupportVectorMachine | False |
| 29 | XGB | XGBoost | False |
In [5]:
Copied!
atom.run(models=["LDA", "RF"], metric="recall_weighted")
atom.run(models=["LDA", "RF"], metric="recall_weighted")
Training ========================= >> Models: LDA, RF Metric: recall_weighted Results for LinearDiscriminantAnalysis: Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.8912 Test evaluation --> recall_weighted: 0.899 Time elapsed: 0.078s ------------------------------------------------- Total time: 0.078s Results for RandomForest: Fit --------------------------------------------- Train evaluation --> recall_weighted: 1.0 Test evaluation --> recall_weighted: 0.9091 Time elapsed: 0.619s ------------------------------------------------- Total time: 0.619s Final results ==================== >> Total time: 0.701s ------------------------------------- LinearDiscriminantAnalysis --> recall_weighted: 0.899 RandomForest --> recall_weighted: 0.9091 !
In [6]:
Copied!
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
Estimator for LDA is: ClassifierChain(base_estimator=LinearDiscriminantAnalysis(), random_state=1) Estimator for RF is: RandomForestClassifier(n_jobs=1, random_state=1)
Some models, such as MLP, have native support for multilabel, but not for multiclass-multioutput tasks. For that reason, their native_multioutput tag is False, but those models don't necessarily need a multioutout meta-estimator. In such cases, use atom's multioutput attribute to tell atom not to use any multioutput wrapper. See here an overview of sklearn classifiers and which tasks they support.
In [7]:
Copied!
atom.multioutput = None
atom.multioutput = None
In [8]:
Copied!
atom.run("MLP")
atom.run("MLP")
Training ========================= >> Models: MLP Metric: recall_weighted Results for MultiLayerPerceptron: Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.9689 Test evaluation --> recall_weighted: 0.9192 Time elapsed: 6.430s ------------------------------------------------- Total time: 6.430s Final results ==================== >> Total time: 6.438s ------------------------------------- MultiLayerPerceptron --> recall_weighted: 0.9192
In [9]:
Copied!
print(f"Estimator for MLP is: {atom.mlp.estimator}")
print(f"Estimator for MLP is: {atom.mlp.estimator}")
Estimator for MLP is: MLPClassifier(random_state=1)
Analyze the results¶
In [10]:
Copied!
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")
Best threshold per target column: [0.72, 0.75, 0.56]
In [11]:
Copied!
atom.rf.evaluate(threshold=thresholds)
atom.rf.evaluate(threshold=thresholds)
Out[11]:
accuracy 0.5167 average_precision 0.6607 f1_weighted 0.6928 jaccard_weighted 0.5632 precision_weighted 0.9315 recall_weighted 0.5960 roc_auc 0.6873 Name: RF, dtype: float64
In [12]:
Copied!
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
In [13]:
Copied!
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))
In [14]:
Copied!
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)