FeatureGrouper
class atom.feature_engineering.FeatureGrouper(groups, operators=None, drop_columns=True, verbose=0)[source]
Extract statistics from similar features.
Replace groups of features with related characteristics with new
features that summarize statistical properties of the group. The
statistical operators are calculated over every row of the group.
The group names and features can be accessed through the groups
method.
This class can be accessed from atom through the feature_grouping method. Read more in the user guide.
Parameters | groups: dict
Group names and features. A feature
can belong to multiple groups.
operators: str, sequence or None, default=None
Statistical operators to apply on the groups. Any operator from
drop_columns: bool, default=Truenumpy or scipy.stats (checked in that order) that is applied
on an array can be used. If None, it uses: min , max , mean ,
median , mode and std .
Whether to drop the columns in verbose: int, default=0groups after transformation.
Verbosity level of the class. Choose from:
|
Attributes | feature_names_in_: np.ndarray
Names of features seen during n_features_in_: intfit .
Number of features seen during fit .
|
See Also
Extract features from datetime columns.
Generate new features.
Reduce the number of features in the data.
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)
Fitting FeatureGrouper...
Grouping features...
--> Group group1 successfully created.
>>> print(atom.dataset)
radius error texture error perimeter error area error smoothness error compactness error concavity error concave points error symmetry error ... worst symmetry worst fractal dimension min(group1) max(group1) mean(group1) median(group1) mode(group1) std(group1) target
0 0.2241 1.5080 1.553 9.833 0.010190 0.01084 0.000000 0.000000 0.02659 ... 0.2932 0.09382 0.00000 143.5 20.816486 0.155000 0.00000 42.901883 1
1 0.4226 1.1500 2.735 40.090 0.003659 0.02855 0.025720 0.012720 0.01817 ... 0.3698 0.10940 0.06758 656.9 78.948101 0.169600 0.06758 194.679737 0
2 0.2222 0.8652 1.444 17.120 0.005517 0.01727 0.020450 0.006747 0.01616 ... 0.2535 0.07993 0.01393 428.0 53.470220 0.121005 0.01393 126.804600 1
3 0.4709 0.9951 2.903 53.160 0.005654 0.02199 0.030590 0.014990 0.01623 ... 0.3590 0.07787 0.05892 1145.0 131.053336 0.175950 0.05892 340.023830 0
4 0.2387 0.6372 1.729 21.830 0.003958 0.01246 0.018310 0.008747 0.01500 ... 0.2778 0.07012 0.04528 800.0 93.581249 0.134225 0.04528 237.441648 1
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 0.2545 0.9832 2.110 21.050 0.004452 0.03055 0.026810 0.013520 0.01454 ... 0.4264 0.12750 0.06924 644.8 77.913697 0.206000 0.06924 191.109014 0
565 0.3460 1.3360 2.066 31.240 0.005868 0.02099 0.020210 0.009064 0.02087 ... 0.2642 0.06953 0.02443 573.2 69.767851 0.129430 0.02443 169.767485 1
566 0.1153 0.6745 0.757 9.006 0.003265 0.00493 0.006493 0.003762 0.01720 ... 0.2901 0.06783 0.01053 476.5 58.291694 0.116930 0.01053 141.291563 1
567 0.3473 0.9209 2.244 32.190 0.004766 0.02374 0.023840 0.008637 0.01772 ... 0.3993 0.10640 0.06303 761.3 90.055875 0.156950 0.06303 225.756961 0
568 0.3438 1.1400 2.225 25.060 0.005463 0.01964 0.020790 0.005398 0.01477 ... 0.2779 0.08121 0.01638 431.9 53.666299 0.137785 0.01638 127.999281 1
[569 rows x 27 columns]
>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer
>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)
>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)
Grouping features...
--> Group group1 successfully created.
>>> print(X)
mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension radius error ... worst concave points worst symmetry worst fractal dimension min(group1) max(group1) mean(group1) median(group1) mode(group1) std(group1)
0 122.80 1001.0 0.11840 0.27760 0.30010 0.14710 0.2419 0.07871 1.0950 ... 0.2654 0.4601 0.11890 10.38 17.99 14.185 14.185 10.38 3.805
1 132.90 1326.0 0.08474 0.07864 0.08690 0.07017 0.1812 0.05667 0.5435 ... 0.1860 0.2750 0.08902 17.77 20.57 19.170 19.170 17.77 1.400
2 130.00 1203.0 0.10960 0.15990 0.19740 0.12790 0.2069 0.05999 0.7456 ... 0.2430 0.3613 0.08758 19.69 21.25 20.470 20.470 19.69 0.780
3 77.58 386.1 0.14250 0.28390 0.24140 0.10520 0.2597 0.09744 0.4956 ... 0.2575 0.6638 0.17300 11.42 20.38 15.900 15.900 11.42 4.480
4 135.10 1297.0 0.10030 0.13280 0.19800 0.10430 0.1809 0.05883 0.7572 ... 0.1625 0.2364 0.07678 14.34 20.29 17.315 17.315 14.34 2.975
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 142.00 1479.0 0.11100 0.11590 0.24390 0.13890 0.1726 0.05623 1.1760 ... 0.2216 0.2060 0.07115 21.56 22.39 21.975 21.975 21.56 0.415
565 131.20 1261.0 0.09780 0.10340 0.14400 0.09791 0.1752 0.05533 0.7655 ... 0.1628 0.2572 0.06637 20.13 28.25 24.190 24.190 20.13 4.060
566 108.30 858.1 0.08455 0.10230 0.09251 0.05302 0.1590 0.05648 0.4564 ... 0.1418 0.2218 0.07820 16.60 28.08 22.340 22.340 16.60 5.740
567 140.10 1265.0 0.11780 0.27700 0.35140 0.15200 0.2397 0.07016 0.7260 ... 0.2650 0.4087 0.12400 20.60 29.33 24.965 24.965 20.60 4.365
568 47.92 181.0 0.05263 0.04362 0.00000 0.00000 0.1587 0.05884 0.3857 ... 0.0000 0.2871 0.07039 7.76 24.54 16.150 16.150 7.76 8.390
[569 rows x 34 columns]
Methods
fit | Do nothing. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
inverse_transform | Do nothing. |
set_output | Set output container. |
set_params | Set the parameters of this estimator. |
transform | Group features. |
method fit(X=None, y=None, **fit_params)[source]
Do nothing.
Implemented for continuity of the API.
method fit_transform(X=None, y=None, **fit_params)[source]
Fit to data, then transform it.
method get_params(deep=True)[source]
Get parameters for this estimator.
Parameters | deep : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
|
Returns | params : dict
Parameter names mapped to their values.
|
method inverse_transform(X=None, y=None, **fit_params)[source]
Do nothing.
Returns the input unchanged. Implemented for continuity of the API.
method set_output(transform=None)[source]
Set output container.
See sklearn's user guide on how to use the
set_output
API. See here a description
of the choices.
method set_params(**params)[source]
Set the parameters of this estimator.
Parameters | **params : dict
Estimator parameters.
|
Returns | self : estimator instance
Estimator instance.
|
method transform(X, y=None)[source]
Group features.