Skip to content

FeatureGrouper


class atom.feature_engineering.FeatureGrouper(groups, operators=None, drop_columns=True, verbose=0)[source]
Extract statistics from similar features.

Replace groups of features with related characteristics with new features that summarize statistical properties of the group. The statistical operators are calculated over every row of the group. The group names and features can be accessed through the groups method.

This class can be accessed from atom through the feature_grouping method. Read more in the user guide.

Parametersgroups: dict
Group names and features. A feature can belong to multiple groups.

operators: str, sequence or None, default=None
Statistical operators to apply on the groups. Any operator from numpy or scipy.stats (checked in that order) that is applied on an array can be used. If None, it uses: min, max, mean, median, mode and std.

drop_columns: bool, default=True
Whether to drop the columns in groups after transformation.

verbose: int, default=0
Verbosity level of the class. Choose from:

  • 0 to not print anything.
  • 1 to print basic information.
  • 2 to print detailed information.

Attributesfeature_names_in_: np.ndarray
Names of features seen during fit.

n_features_in_: int
Number of features seen during fit.


See Also

FeatureExtractor

Extract features from datetime columns.

FeatureGenerator

Generate new features.

FeatureSelector

Reduce the number of features in the data.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)

Fitting FeatureGrouper...
Grouping features...
 --> Group group1 successfully created.


>>> print(atom.dataset)

     radius error  texture error  perimeter error  area error  smoothness error  compactness error  concavity error  concave points error  symmetry error  ...  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)  target
0          0.2241         1.5080            1.553       9.833          0.010190            0.01084         0.000000              0.000000         0.02659  ...          0.2932                  0.09382      0.00000        143.5     20.816486        0.155000       0.00000    42.901883       1
1          0.4226         1.1500            2.735      40.090          0.003659            0.02855         0.025720              0.012720         0.01817  ...          0.3698                  0.10940      0.06758        656.9     78.948101        0.169600       0.06758   194.679737       0
2          0.2222         0.8652            1.444      17.120          0.005517            0.01727         0.020450              0.006747         0.01616  ...          0.2535                  0.07993      0.01393        428.0     53.470220        0.121005       0.01393   126.804600       1
3          0.4709         0.9951            2.903      53.160          0.005654            0.02199         0.030590              0.014990         0.01623  ...          0.3590                  0.07787      0.05892       1145.0    131.053336        0.175950       0.05892   340.023830       0
4          0.2387         0.6372            1.729      21.830          0.003958            0.01246         0.018310              0.008747         0.01500  ...          0.2778                  0.07012      0.04528        800.0     93.581249        0.134225       0.04528   237.441648       1
..            ...            ...              ...         ...               ...                ...              ...                   ...             ...  ...             ...                      ...          ...          ...           ...             ...           ...          ...     ...
564        0.2545         0.9832            2.110      21.050          0.004452            0.03055         0.026810              0.013520         0.01454  ...          0.4264                  0.12750      0.06924        644.8     77.913697        0.206000       0.06924   191.109014       0
565        0.3460         1.3360            2.066      31.240          0.005868            0.02099         0.020210              0.009064         0.02087  ...          0.2642                  0.06953      0.02443        573.2     69.767851        0.129430       0.02443   169.767485       1
566        0.1153         0.6745            0.757       9.006          0.003265            0.00493         0.006493              0.003762         0.01720  ...          0.2901                  0.06783      0.01053        476.5     58.291694        0.116930       0.01053   141.291563       1
567        0.3473         0.9209            2.244      32.190          0.004766            0.02374         0.023840              0.008637         0.01772  ...          0.3993                  0.10640      0.06303        761.3     90.055875        0.156950       0.06303   225.756961       0
568        0.3438         1.1400            2.225      25.060          0.005463            0.01964         0.020790              0.005398         0.01477  ...          0.2779                  0.08121      0.01638        431.9     53.666299        0.137785       0.01638   127.999281       1

[569 rows x 27 columns]
>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer

>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)

Grouping features...
 --> Group group1 successfully created.


>>> print(X)

     mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  mean fractal dimension  radius error  ...  worst concave points  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)
0            122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419                 0.07871        1.0950  ...                0.2654          0.4601                  0.11890        10.38        17.99        14.185          14.185         10.38        3.805
1            132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812                 0.05667        0.5435  ...                0.1860          0.2750                  0.08902        17.77        20.57        19.170          19.170         17.77        1.400
2            130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069                 0.05999        0.7456  ...                0.2430          0.3613                  0.08758        19.69        21.25        20.470          20.470         19.69        0.780
3             77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597                 0.09744        0.4956  ...                0.2575          0.6638                  0.17300        11.42        20.38        15.900          15.900         11.42        4.480
4            135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809                 0.05883        0.7572  ...                0.1625          0.2364                  0.07678        14.34        20.29        17.315          17.315         14.34        2.975
..              ...        ...              ...               ...             ...                  ...            ...                     ...           ...  ...                   ...             ...                      ...          ...          ...           ...             ...           ...          ...
564          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726                 0.05623        1.1760  ...                0.2216          0.2060                  0.07115        21.56        22.39        21.975          21.975         21.56        0.415
565          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752                 0.05533        0.7655  ...                0.1628          0.2572                  0.06637        20.13        28.25        24.190          24.190         20.13        4.060
566          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590                 0.05648        0.4564  ...                0.1418          0.2218                  0.07820        16.60        28.08        22.340          22.340         16.60        5.740
567          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397                 0.07016        0.7260  ...                0.2650          0.4087                  0.12400        20.60        29.33        24.965          24.965         20.60        4.365
568           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587                 0.05884        0.3857  ...                0.0000          0.2871                  0.07039         7.76        24.54        16.150          16.150          7.76        8.390

[569 rows x 34 columns]


Methods

fitDo nothing.
fit_transformFit to data, then transform it.
get_paramsGet parameters for this estimator.
inverse_transformDo nothing.
set_outputSet output container.
set_paramsSet the parameters of this estimator.
transformGroup features.


method fit(X=None, y=None, **fit_params)[source]
Do nothing.

Implemented for continuity of the API.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returnsself
Estimator instance.



method fit_transform(X=None, y=None, **fit_params)[source]
Fit to data, then transform it.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returnsdataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method get_params(deep=True)[source]
Get parameters for this estimator.

Parametersdeep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returnsparams : dict
Parameter names mapped to their values.



method inverse_transform(X=None, y=None, **fit_params)[source]
Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

Returnsdataframe
Feature set. Only returned if provided.

series or dataframe
Target column(s). Only returned if provided.



method set_output(transform=None)[source]
Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameterstransform: str or None, default=None
Configure the output of the transform, fit_transform, and inverse_transform method. If None, the configuration is not changed. Choose from:

  • "numpy"
  • "pandas" (default)
  • "pandas-pyarrow"
  • "polars"
  • "polars-lazy"
  • "pyarrow"
  • "modin"
  • "dask"
  • "pyspark"
  • "pyspark-pandas"

ReturnsSelf
Estimator instance.



method set_params(**params)[source]
Set the parameters of this estimator.

Parameters**params : dict
Estimator parameters.

Returnsself : estimator instance
Estimator instance.



method transform(X, y=None)[source]
Group features.

ParametersX: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returnsdataframe
Transformed feature set.