FeatureGrouper
Extract statistics from similar features.
Replace groups of features with related characteristics with new
features that summarize statistical properties of the group. The
statistical operators are calculated over every row of the group.
The group names and features can be accessed through the groups
method.
This class can be accessed from atom through the feature_grouping method. Read more in the user guide.
Parameters |
groups: dict
Group names and features. A feature
can belong to multiple groups.
operators: str, sequence or None, default=None
Statistical operators to apply on the groups. Any operator from
drop_columns: bool, default=Truenumpy or scipy.stats (checked in that order) that is applied
on an array can be used. If None, it uses: min , max , mean ,
median , mode and std .
Whether to drop the columns in
verbose: int, default=0groups after transformation.
Verbosity level of the class. Choose from:
|
Attributes |
feature_names_in_: np.ndarray
Names of features seen during
n_features_in_: intfit .
Number of features seen during fit .
|
See Also
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)
Fitting FeatureGrouper...
Grouping features...
--> Group group1 successfully created.
>>> print(atom.dataset)
radius error texture error perimeter error area error smoothness error compactness error concavity error concave points error symmetry error ... worst symmetry worst fractal dimension min(group1) max(group1) mean(group1) median(group1) mode(group1) std(group1) target
0 0.8191 1.9310 4.493 103.90 0.008074 0.040880 0.053210 0.018340 0.02383 ... 0.3007 0.08314 0.061320 1110.0 127.523316 0.176850 0.061320 329.486981 0
1 0.1859 1.9260 1.011 14.47 0.007831 0.008776 0.015560 0.006240 0.03139 ... 0.3200 0.06576 0.015530 428.9 53.379498 0.136250 0.015530 127.109799 1
2 0.2810 0.8135 3.369 23.81 0.004929 0.066570 0.076830 0.013680 0.01526 ... 0.2845 0.12490 0.028330 542.9 66.369889 0.141200 0.028330 160.878141 1
3 0.1639 1.1400 1.223 14.66 0.005919 0.032700 0.049570 0.010380 0.01208 ... 0.2048 0.07628 0.028000 553.5 66.981375 0.111560 0.028000 164.121249 1
4 0.3428 0.3981 2.537 29.06 0.004732 0.015060 0.018550 0.010670 0.02163 ... 0.3109 0.08187 0.035280 668.7 79.352913 0.120425 0.035280 198.400183 1
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 0.2868 1.1430 2.289 20.56 0.010170 0.014430 0.018610 0.012500 0.03464 ... 0.2227 0.06777 0.018750 334.2 43.509823 0.135500 0.018750 98.929860 1
565 0.8529 1.8490 5.632 93.54 0.010750 0.027220 0.050810 0.019110 0.02293 ... 0.2341 0.07421 0.056990 1094.0 125.561400 0.159350 0.056990 324.783656 0
566 0.2441 2.0900 1.648 16.80 0.012910 0.022220 0.004174 0.007082 0.02572 ... 0.2262 0.06742 0.005025 311.7 40.661394 0.139700 0.005025 92.358226 1
567 0.1998 0.6068 1.443 16.07 0.004413 0.014430 0.015090 0.007369 0.01354 ... 0.3518 0.08665 0.031520 551.1 67.128030 0.143050 0.031520 163.348717 1
568 0.2094 0.7636 1.231 17.67 0.008725 0.020030 0.023350 0.011320 0.02625 ... 0.3380 0.09584 0.033700 513.7 62.632288 0.136750 0.033700 152.314252 1
[569 rows x 27 columns]
>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer
>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)
>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)
Grouping features...
--> Group group1 successfully created.
>>> print(X)
mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension radius error ... worst concave points worst symmetry worst fractal dimension min(group1) max(group1) mean(group1) median(group1) mode(group1) std(group1)
0 122.80 1001.0 0.11840 0.27760 0.30010 0.14710 0.2419 0.07871 1.0950 ... 0.2654 0.4601 0.11890 10.38 17.99 14.185 14.185 10.38 3.805
1 132.90 1326.0 0.08474 0.07864 0.08690 0.07017 0.1812 0.05667 0.5435 ... 0.1860 0.2750 0.08902 17.77 20.57 19.170 19.170 17.77 1.400
2 130.00 1203.0 0.10960 0.15990 0.19740 0.12790 0.2069 0.05999 0.7456 ... 0.2430 0.3613 0.08758 19.69 21.25 20.470 20.470 19.69 0.780
3 77.58 386.1 0.14250 0.28390 0.24140 0.10520 0.2597 0.09744 0.4956 ... 0.2575 0.6638 0.17300 11.42 20.38 15.900 15.900 11.42 4.480
4 135.10 1297.0 0.10030 0.13280 0.19800 0.10430 0.1809 0.05883 0.7572 ... 0.1625 0.2364 0.07678 14.34 20.29 17.315 17.315 14.34 2.975
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 142.00 1479.0 0.11100 0.11590 0.24390 0.13890 0.1726 0.05623 1.1760 ... 0.2216 0.2060 0.07115 21.56 22.39 21.975 21.975 21.56 0.415
565 131.20 1261.0 0.09780 0.10340 0.14400 0.09791 0.1752 0.05533 0.7655 ... 0.1628 0.2572 0.06637 20.13 28.25 24.190 24.190 20.13 4.060
566 108.30 858.1 0.08455 0.10230 0.09251 0.05302 0.1590 0.05648 0.4564 ... 0.1418 0.2218 0.07820 16.60 28.08 22.340 22.340 16.60 5.740
567 140.10 1265.0 0.11780 0.27700 0.35140 0.15200 0.2397 0.07016 0.7260 ... 0.2650 0.4087 0.12400 20.60 29.33 24.965 24.965 20.60 4.365
568 47.92 181.0 0.05263 0.04362 0.00000 0.00000 0.1587 0.05884 0.3857 ... 0.0000 0.2871 0.07039 7.76 24.54 16.150 16.150 7.76 8.390
[569 rows x 34 columns]
Methods
fit | Do nothing. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
inverse_transform | Do nothing. |
set_output | Set output container. |
set_params | Set the parameters of this estimator. |
transform | Group features. |
Do nothing.
Implemented for continuity of the API.
Fit to data, then transform it.
Get parameters for this estimator.
Parameters |
deep : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
|
Returns |
params : dict
Parameter names mapped to their values.
|
Do nothing.
Returns the input unchanged. Implemented for continuity of the API.
Set output container.
See sklearn's user guide on how to use the
set_output
API. See here a description
of the choices.
Set the parameters of this estimator.
Parameters |
**params : dict
Estimator parameters.
|
Returns |
self : estimator instance
Estimator instance.
|
Group features.