Example: Multilabel classification¶
This example shows how to use ATOM to solve a multilabel classification problem.
The data used is a synthetic dataset created using sklearn's make_multilabel_classification function.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
In [2]:
Copied!
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
Run the pipeline¶
In [3]:
Copied!
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Multilabel classification. Dataset stats ==================== >> Shape: (300, 23) Train set size: 240 Test set size: 60 ------------------------------------- Memory: 51.73 kB Scaled: False Outlier values: 35 (0.6%)
In [4]:
Copied!
# Show the models that natively support multilabel tasks
atom.available_models(native_multilabel=True)
# Show the models that natively support multilabel tasks
atom.available_models(native_multilabel=True)
Out[4]:
acronym | fullname | estimator | module | handles_missing | needs_scaling | accepts_sparse | native_multilabel | native_multioutput | validation | supports_engines | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Tree | DecisionTree | DecisionTreeClassifier | sklearn.tree._classes | True | False | True | True | True | None | sklearn |
1 | ETree | ExtraTree | ExtraTreeClassifier | sklearn.tree._classes | False | False | True | True | True | None | sklearn |
2 | ET | ExtraTrees | ExtraTreesClassifier | sklearn.ensemble._forest | False | False | True | True | True | None | sklearn |
3 | KNN | KNearestNeighbors | KNeighborsClassifier | sklearn.neighbors._classification | False | True | True | True | True | None | sklearn, sklearnex, cuml |
4 | MLP | MultiLayerPerceptron | MLPClassifier | sklearn.neural_network._multilayer_perceptron | False | True | True | True | False | max_iter | sklearn |
5 | RNN | RadiusNearestNeighbors | RadiusNeighborsClassifier | sklearn.neighbors._classification | False | True | True | True | True | None | sklearn |
6 | RF | RandomForest | RandomForestClassifier | sklearn.ensemble._forest | False | False | True | True | True | None | sklearn, sklearnex, cuml |
7 | Ridge | Ridge | RidgeClassifier | sklearn.linear_model._ridge | False | True | True | True | False | None | sklearn, sklearnex, cuml |
In [5]:
Copied!
atom.run(models=["LDA", "RF"], metric="recall_weighted")
atom.run(models=["LDA", "RF"], metric="recall_weighted")
Training ========================= >> Models: LDA, RF Metric: recall_weighted Results for LinearDiscriminantAnalysis: Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.8912 Test evaluation --> recall_weighted: 0.899 Time elapsed: 0.040s ------------------------------------------------- Time: 0.040s Results for RandomForest: Fit --------------------------------------------- Train evaluation --> recall_weighted: 1.0 Test evaluation --> recall_weighted: 0.9091 Time elapsed: 0.156s ------------------------------------------------- Time: 0.156s Final results ==================== >> Total time: 0.216s ------------------------------------- LinearDiscriminantAnalysis --> recall_weighted: 0.899 RandomForest --> recall_weighted: 0.9091 !
In [6]:
Copied!
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
Estimator for LDA is: ClassifierChain(base_estimator=LinearDiscriminantAnalysis(), random_state=1) Estimator for RF is: RandomForestClassifier(n_jobs=1, random_state=1)
Add custom multilabel models¶
To use your own meta-estimator with custom parameters, add it as a custom model. It's also possible to tune the hyperparameters of this custom meta-estimator.
In [7]:
Copied!
from atom import ATOMModel
from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from optuna.distributions import CategoricalDistribution, IntDistribution
custom_model = ATOMModel(
estimator=ClassifierChain(LogisticRegression(), cv=3),
name="chain",
needs_scaling=True,
native_multilabel=True,
)
atom.run(
models=custom_model,
n_trials=5,
ht_params={
"distributions": {
"order": CategoricalDistribution([[0, 1, 2], [2, 1, 0], [1, 2, 0]]),
"base_estimator__max_iter": IntDistribution(100, 200, step=10),
"base_estimator__solver": CategoricalDistribution(["lbfgs", "newton-cg"]),
}
}
)
from atom import ATOMModel
from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from optuna.distributions import CategoricalDistribution, IntDistribution
custom_model = ATOMModel(
estimator=ClassifierChain(LogisticRegression(), cv=3),
name="chain",
needs_scaling=True,
native_multilabel=True,
)
atom.run(
models=custom_model,
n_trials=5,
ht_params={
"distributions": {
"order": CategoricalDistribution([[0, 1, 2], [2, 1, 0], [1, 2, 0]]),
"base_estimator__max_iter": IntDistribution(100, 200, step=10),
"base_estimator__solver": CategoricalDistribution(["lbfgs", "newton-cg"]),
}
}
)
Training ========================= >> Models: chain Metric: recall_weighted Running hyperparameter tuning for ClassifierChain... | trial | order | base_estimator__max_iter | base_estimator__solver | recall_weighted | best_recall_weighted | time_trial | time_ht | state | | ----- | --------- | ------------------------ | ---------------------- | --------------- | -------------------- | ---------- | ------- | -------- | | 0 | [2, 1, 0] | 130 | lbfgs | 0.8205 | 0.8205 | 0.055s | 0.055s | COMPLETE | | 1 | [1, 2, 0] | 150 | newton-cg | 0.8205 | 0.8205 | 0.069s | 0.124s | COMPLETE | | 2 | [2, 1, 0] | 170 | newton-cg | 0.8205 | 0.8205 | 0.056s | 0.180s | COMPLETE | | 3 | [1, 2, 0] | 200 | newton-cg | 0.8205 | 0.8205 | 0.060s | 0.240s | COMPLETE | | 4 | [2, 1, 0] | 100 | newton-cg | 0.8205 | 0.8205 | 0.047s | 0.287s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 0 Best parameters: --> order: [2, 1, 0] --> base_estimator__max_iter: 130 --> base_estimator__solver: lbfgs Best evaluation --> recall_weighted: 0.8205 Time elapsed: 0.287s Fit --------------------------------------------- Train evaluation --> recall_weighted: 0.8964 Test evaluation --> recall_weighted: 0.9192 Time elapsed: 0.195s ------------------------------------------------- Time: 0.482s Final results ==================== >> Total time: 0.507s ------------------------------------- ClassifierChain --> recall_weighted: 0.9192
Analyze the results¶
In [9]:
Copied!
atom.rf.evaluate()
atom.rf.evaluate()
Out[9]:
accuracy 0.6333 ap 0.9120 f1_weighted 0.8608 jaccard_weighted 0.7802 precision_weighted 0.8711 recall_weighted 0.9091 auc 0.9167 Name: RF, dtype: float64
In [10]:
Copied!
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
In [11]:
Copied!
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="chain", target=(2, 1))
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="chain", target=(2, 1))
In [12]:
Copied!
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)
with atom.canvas(figsize=(900, 600)):
atom.plot_calibration(target=0)
atom.plot_calibration(target=1)