Example: Multilabel classification¶

This example shows how to use ATOM to solve a multilabel classification problem.

The data used is a synthetic dataset created using sklearn's make_multilabel_classification function.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification

In [2]:

                
                    Copied!
                    
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)

Run the pipeline¶

In [3]:

                
                    Copied!
                    
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: multilabel classification.

Dataset stats ==================== >>
Shape: (300, 23)
Train set size: 240
Test set size: 60
-------------------------------------
Memory: 51.73 kB
Scaled: False
Outlier values: 35 (0.6%)

In [4]:

                
                    Copied!
                    
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multioutput"]]

Out[4]:

	acronym	model	native_multioutput
0	AdaB	AdaBoost	False
1	Bag	Bagging	False
2	BNB	BernoulliNB	False
3	CatB	CatBoost	False
4	CatNB	CategoricalNB	False
5	CNB	ComplementNB	False
6	Tree	DecisionTree	True
7	Dummy	Dummy	False
8	ETree	ExtraTree	True
9	ET	ExtraTrees	True
10	GNB	GaussianNB	False
11	GP	GaussianProcess	False
12	GBM	GradientBoosting	False
13	hGBM	HistGradientBoosting	False
14	KNN	KNearestNeighbors	True
15	LGB	LightGBM	False
16	LDA	LinearDiscriminantAnalysis	False
17	lSVM	LinearSVM	False
18	LR	LogisticRegression	False
19	MLP	MultiLayerPerceptron	False
20	MNB	MultinomialNB	False
21	PA	PassiveAggressive	False
22	Perc	Perceptron	False
23	QDA	QuadraticDiscriminantAnalysis	False
24	RNN	RadiusNearestNeighbors	True
25	RF	RandomForest	True
26	Ridge	Ridge	False
27	SGD	StochasticGradientDescent	False
28	SVM	SupportVectorMachine	False
29	XGB	XGBoost	False

In [5]:

                
                    Copied!
                    
atom.run(models=["LDA", "RF"], metric="recall_weighted")
atom.run(models=["LDA", "RF"], metric="recall_weighted")

Training ========================= >>
Models: LDA, RF
Metric: recall_weighted


Results for LinearDiscriminantAnalysis:
Fit ---------------------------------------------
Train evaluation --> recall_weighted: 0.8912
Test evaluation --> recall_weighted: 0.899
Time elapsed: 0.078s
-------------------------------------------------
Total time: 0.078s


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> recall_weighted: 1.0
Test evaluation --> recall_weighted: 0.9091
Time elapsed: 0.619s
-------------------------------------------------
Total time: 0.619s


Final results ==================== >>
Total time: 0.701s
-------------------------------------
LinearDiscriminantAnalysis --> recall_weighted: 0.899
RandomForest               --> recall_weighted: 0.9091 !

In [6]:

                
                    Copied!
                    
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")

Estimator for LDA is: ClassifierChain(base_estimator=LinearDiscriminantAnalysis(), random_state=1)
Estimator for RF is: RandomForestClassifier(n_jobs=1, random_state=1)

Some models, such as MLP, have native support for multilabel, but not for multiclass-multioutput tasks. For that reason, their native_multioutput tag is False, but those models don't necessarily need a multioutout meta-estimator. In such cases, use atom's multioutput attribute to tell atom not to use any multioutput wrapper. See here an overview of sklearn classifiers and which tasks they support.

In [7]:

                
                    Copied!
                    
atom.multioutput = None
atom.multioutput = None

In [8]:

                
                    Copied!
                    
atom.run("MLP")
atom.run("MLP")

Training ========================= >>
Models: MLP
Metric: recall_weighted


Results for MultiLayerPerceptron:
Fit ---------------------------------------------
Train evaluation --> recall_weighted: 0.9689
Test evaluation --> recall_weighted: 0.9192
Time elapsed: 6.430s
-------------------------------------------------
Total time: 6.430s


Final results ==================== >>
Total time: 6.438s
-------------------------------------
MultiLayerPerceptron --> recall_weighted: 0.9192

In [9]:

                
                    Copied!
                    
print(f"Estimator for MLP is: {atom.mlp.estimator}")
print(f"Estimator for MLP is: {atom.mlp.estimator}")

Estimator for MLP is: MLPClassifier(random_state=1)

Analyze the results¶

In [10]:

                
                    Copied!
                    
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")

Best threshold per target column: [0.72, 0.75, 0.56]

In [11]:

                
                    Copied!
                    
atom.rf.evaluate(threshold=thresholds)
atom.rf.evaluate(threshold=thresholds)

Out[11]:

accuracy              0.5167
average_precision     0.6607
f1_weighted           0.6928
jaccard_weighted      0.5632
precision_weighted    0.9315
recall_weighted       0.5960
roc_auc               0.6873
Name: RF, dtype: float64

In [12]:

                
                    Copied!
                    
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)

In [13]:

                
                    Copied!
                    
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))

In [14]:

                
                    Copied!
                    
with atom.canvas(figsize=(900, 600)):
    atom.plot_calibration(target=0)
    atom.plot_calibration(target=1)
with atom.canvas(figsize=(900, 600)):
    atom.plot_calibration(target=0)
    atom.plot_calibration(target=1)