Skip to content

FeatureGrouper


class atom.feature_engineering.FeatureGrouper(groups, operators=None, drop_columns=True, verbose=0)[source]

Extract statistics from similar features.

Replace groups of features with related characteristics with new features that summarize statistical properties of the group. The statistical operators are calculated over every row of the group. The group names and features can be accessed through the groups method.

This class can be accessed from atom through the feature_grouping method. Read more in the user guide.

Parameters groups: dict
Group names and features. A feature can belong to multiple groups.

operators: str, sequence or None, default=None
Statistical operators to apply on the groups. Any operator from numpy or scipy.stats (checked in that order) that is applied on an array can be used. If None, it uses: min, max, mean, median, mode and std.

drop_columns: bool, default=True
Whether to drop the columns in groups after transformation.

verbose: int, default=0
Verbosity level of the class. Choose from:

  • 0 to not print anything.
  • 1 to print basic information.
  • 2 to print detailed information.

Attributes feature_names_in_: np.ndarray
Names of features seen during fit.

n_features_in_: int
Number of features seen during fit.


See Also

FeatureExtractor

Extract features from datetime columns.

FeatureGenerator

Generate new features.

FeatureSelector

Reduce the number of features in the data.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)

Fitting FeatureGrouper...
Grouping features...
 --> Group group1 successfully created.

>>> print(atom.dataset)

     radius error  texture error  perimeter error  area error  smoothness error  compactness error  concavity error  concave points error  symmetry error  ...  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)  target
0          0.8191         1.9310            4.493      103.90          0.008074           0.040880         0.053210              0.018340         0.02383  ...          0.3007                  0.08314     0.061320       1110.0    127.523316        0.176850      0.061320   329.486981       0
1          0.1859         1.9260            1.011       14.47          0.007831           0.008776         0.015560              0.006240         0.03139  ...          0.3200                  0.06576     0.015530        428.9     53.379498        0.136250      0.015530   127.109799       1
2          0.2810         0.8135            3.369       23.81          0.004929           0.066570         0.076830              0.013680         0.01526  ...          0.2845                  0.12490     0.028330        542.9     66.369889        0.141200      0.028330   160.878141       1
3          0.1639         1.1400            1.223       14.66          0.005919           0.032700         0.049570              0.010380         0.01208  ...          0.2048                  0.07628     0.028000        553.5     66.981375        0.111560      0.028000   164.121249       1
4          0.3428         0.3981            2.537       29.06          0.004732           0.015060         0.018550              0.010670         0.02163  ...          0.3109                  0.08187     0.035280        668.7     79.352913        0.120425      0.035280   198.400183       1
..            ...            ...              ...         ...               ...                ...              ...                   ...             ...  ...             ...                      ...          ...          ...           ...             ...           ...          ...     ...
564        0.2868         1.1430            2.289       20.56          0.010170           0.014430         0.018610              0.012500         0.03464  ...          0.2227                  0.06777     0.018750        334.2     43.509823        0.135500      0.018750    98.929860       1
565        0.8529         1.8490            5.632       93.54          0.010750           0.027220         0.050810              0.019110         0.02293  ...          0.2341                  0.07421     0.056990       1094.0    125.561400        0.159350      0.056990   324.783656       0
566        0.2441         2.0900            1.648       16.80          0.012910           0.022220         0.004174              0.007082         0.02572  ...          0.2262                  0.06742     0.005025        311.7     40.661394        0.139700      0.005025    92.358226       1
567        0.1998         0.6068            1.443       16.07          0.004413           0.014430         0.015090              0.007369         0.01354  ...          0.3518                  0.08665     0.031520        551.1     67.128030        0.143050      0.031520   163.348717       1
568        0.2094         0.7636            1.231       17.67          0.008725           0.020030         0.023350              0.011320         0.02625  ...          0.3380                  0.09584     0.033700        513.7     62.632288        0.136750      0.033700   152.314252       1

[569 rows x 27 columns]
>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer

>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)

Grouping features...
 --> Group group1 successfully created.

>>> print(X)

     mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  mean fractal dimension  radius error  ...  worst concave points  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)
0            122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419                 0.07871        1.0950  ...                0.2654          0.4601                  0.11890        10.38        17.99        14.185          14.185         10.38        3.805
1            132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812                 0.05667        0.5435  ...                0.1860          0.2750                  0.08902        17.77        20.57        19.170          19.170         17.77        1.400
2            130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069                 0.05999        0.7456  ...                0.2430          0.3613                  0.08758        19.69        21.25        20.470          20.470         19.69        0.780
3             77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597                 0.09744        0.4956  ...                0.2575          0.6638                  0.17300        11.42        20.38        15.900          15.900         11.42        4.480
4            135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809                 0.05883        0.7572  ...                0.1625          0.2364                  0.07678        14.34        20.29        17.315          17.315         14.34        2.975
..              ...        ...              ...               ...             ...                  ...            ...                     ...           ...  ...                   ...             ...                      ...          ...          ...           ...             ...           ...          ...
564          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726                 0.05623        1.1760  ...                0.2216          0.2060                  0.07115        21.56        22.39        21.975          21.975         21.56        0.415
565          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752                 0.05533        0.7655  ...                0.1628          0.2572                  0.06637        20.13        28.25        24.190          24.190         20.13        4.060
566          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590                 0.05648        0.4564  ...                0.1418          0.2218                  0.07820        16.60        28.08        22.340          22.340         16.60        5.740
567          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397                 0.07016        0.7260  ...                0.2650          0.4087                  0.12400        20.60        29.33        24.965          24.965         20.60        4.365
568           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587                 0.05884        0.3857  ...                0.0000          0.2871                  0.07039         7.76        24.54        16.150          16.150          7.76        8.390

[569 rows x 34 columns]


Methods

fitDo nothing.
fit_transformFit to data, then transform it.
get_paramsGet parameters for this estimator.
inverse_transformDo nothing.
set_outputSet output container.
set_paramsSet the parameters of this estimator.
transformGroup features.


method fit(X=None, y=None, **fit_params)[source]

Do nothing.

Implemented for continuity of the API.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returns self
Estimator instance.



method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returns dataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns params : dict
Parameter names mapped to their values.



method inverse_transform(X=None, y=None, **fit_params)[source]

Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

Returns dataframe
Feature set. Only returned if provided.

series or dataframe
Target column(s). Only returned if provided.



method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters transform: str or None, default=None
Configure the output of the transform, fit_transform, and inverse_transform method. If None, the configuration is not changed. Choose from:

  • "numpy"
  • "pandas" (default)
  • "pandas-pyarrow"
  • "polars"
  • "polars-lazy"
  • "pyarrow"
  • "modin"
  • "dask"
  • "pyspark"
  • "pyspark-pandas"

Returns Self
Estimator instance.



method set_params(**params)[source]

Set the parameters of this estimator.

Parameters **params : dict
Estimator parameters.

Returns self : estimator instance
Estimator instance.



method transform(X, y=None)[source]

Group features.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returns dataframe
Transformed feature set.