Example: Metadata¶
This example shows how to add metadata like groups
and sample_weight
to atom.
Import the wine dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict wines into three groups (which cultivator it's from) using features based on the results of chemical analysis.
Load the data¶
In [1]:
Copied!
# Import packages
import numpy as np
from sklearn.datasets import load_wine
from atom import ATOMClassifier
# Import packages
import numpy as np
from sklearn.datasets import load_wine
from atom import ATOMClassifier
In [2]:
Copied!
# Load data
X, y = load_wine(return_X_y=True, as_frame=True)
# Let's have a look
X.head()
# Load data
X, y = load_wine(return_X_y=True, as_frame=True)
# Let's have a look
X.head()
Out[2]:
alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280/od315_of_diluted_wines | proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14.23 | 1.71 | 2.43 | 15.6 | 127.0 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065.0 |
1 | 13.20 | 1.78 | 2.14 | 11.2 | 100.0 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050.0 |
2 | 13.16 | 2.36 | 2.67 | 18.6 | 101.0 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185.0 |
3 | 14.37 | 1.95 | 2.50 | 16.8 | 113.0 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480.0 |
4 | 13.24 | 2.59 | 2.87 | 21.0 | 118.0 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735.0 |
In [3]:
Copied!
# Create (dummy) groups and sample_weights for the rows
groups = np.random.randint(5, size=X.shape[0])
sample_weight = np.random.randint(5, size=X.shape[0])
print(groups)
# Create (dummy) groups and sample_weights for the rows
groups = np.random.randint(5, size=X.shape[0])
sample_weight = np.random.randint(5, size=X.shape[0])
print(groups)
[3 2 4 4 4 2 3 1 3 4 4 3 4 4 2 0 4 0 4 4 0 0 4 1 4 0 1 4 4 3 1 1 0 0 2 0 3 0 2 4 1 0 2 4 4 1 1 1 0 3 1 1 0 2 3 4 4 0 1 1 3 3 0 2 0 4 4 4 2 0 0 4 1 0 2 4 3 0 4 3 1 2 2 0 2 4 2 0 3 0 0 3 4 2 3 1 3 0 2 0 1 2 4 3 3 3 0 2 4 0 4 3 4 1 4 3 3 0 0 4 2 2 3 2 3 1 0 2 4 0 4 4 0 2 0 2 3 4 4 4 2 4 2 2 4 3 2 2 4 3 4 2 1 1 0 0 1 4 3 2 3 0 0 4 2 0 1 3 1 4 4 1 1 3 0 1 1 2]
Run the pipeline¶
Add the metadata to the constructor. We leave index=True
to prove the group functionality works.
When groups are specified, test_size
specifies the number of groups in the test set.
In [4]:
Copied!
atom = ATOMClassifier(
X,
y=y,
index=True,
metadata={"groups": groups, "sample_weight": sample_weight},
test_size=1,
verbose=2,
random_state=1,
)
atom = ATOMClassifier(
X,
y=y,
index=True,
metadata={"groups": groups, "sample_weight": sample_weight},
test_size=1,
verbose=2,
random_state=1,
)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Multiclass classification. Dataset stats ==================== >> Shape: (178, 14) Train set size: 145 Test set size: 33 ------------------------------------- Memory: 24.82 kB Scaled: False Outlier values: 9 (0.4%)
In [5]:
Copied!
# Show all rows in the test set belong to the same group
atom.metadata["groups"].loc[atom.test.index]
# Show all rows in the test set belong to the same group
atom.metadata["groups"].loc[atom.test.index]
Out[5]:
34 2 120 2 53 2 68 2 151 2 81 2 164 2 5 2 135 2 140 2 143 2 74 2 146 2 133 2 38 2 142 2 63 2 127 2 123 2 86 2 84 2 159 2 1 2 101 2 177 2 42 2 121 2 107 2 147 2 93 2 98 2 82 2 14 2 Name: groups, dtype: int64
In [14]:
Copied!
# Visualize the groups
atom.plot_data_splits()
# Visualize the groups
atom.plot_data_splits()
In [6]:
Copied!
atom.scale()
atom.scale()
Fitting Scaler... Scaling features...
In [7]:
Copied!
# Note the sample weights are passed to the scaler
atom.pipeline[0].get_metadata_routing()
# Note the sample weights are passed to the scaler
atom.pipeline[0].get_metadata_routing()
Out[7]:
{'fit': {'sample_weight': True}}
In [9]:
Copied!
atom.run("LR")
atom.run("LR")
Training ========================= >> Models: LR Metric: f1_weighted Results for LogisticRegression: Fit --------------------------------------------- Train evaluation --> f1_weighted: 1.0 Test evaluation --> f1_weighted: 1.0 Time elapsed: 0.036s ------------------------------------------------- Time: 0.036s Final results ==================== >> Total time: 0.040s ------------------------------------- LogisticRegression --> f1_weighted: 1.0
In [10]:
Copied!
# The same applies to models...
atom.lr.estimator.get_metadata_routing()
# The same applies to models...
atom.lr.estimator.get_metadata_routing()
Out[10]:
{'fit': {'sample_weight': True}, 'score': {'sample_weight': None}}
In [11]:
Copied!
# ... and metrics
atom._metric[0].get_metadata_routing()
# ... and metrics
atom._metric[0].get_metadata_routing()
Out[11]:
{'score': {'sample_weight': True}}
In [16]:
Copied!
atom.lr.cross_validate()
atom.lr.cross_validate()
Applying cross-validation...
Out[16]:
train_f1_weighted | test_f1_weighted | time | |
---|---|---|---|
0 | 1.000000 | 0.985135 | 0.024023 |
1 | 1.000000 | 1.000000 | 0.022020 |
2 | 1.000000 | 0.962963 | 0.017015 |
3 | 1.000000 | 1.000000 | 0.015013 |
4 | 1.000000 | 1.000000 | 0.020018 |
mean | 1.000000 | 0.989620 | 0.019618 |
In [17]:
Copied!
atom.plot_cv_splits()
atom.plot_cv_splits()