FeatureGrouper

class atom.feature_engineering.FeatureGrouper(groups, operators=None, drop_columns=True, verbose=0)[source]

Extract statistics from similar features.

Replace groups of features with related characteristics with new features that summarize statistical properties of the group. The statistical operators are calculated over every row of the group. The group names and features can be accessed through the groups method.

This class can be accessed from atom through the feature_grouping method. Read more in the user guide.

Parameters

groups: dict

Group names and features. A feature can belong to multiple groups.

operators: str, sequence or None, default=None

Statistical operators to apply on the groups. Any operator from numpy or scipy.stats (checked in that order) that is applied on an array can be used. If None, it uses: min, max, mean, median, mode and std.

drop_columns: bool, default=True

Whether to drop the columns in groups after transformation.

verbose: int, default=0

Verbosity level of the class. Choose from:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

Attributes

feature_names_in_: np.ndarray

Names of features seen during fit.

n_features_in_: int

Number of features seen during fit.

Example

atomstand-alone

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)

Fitting FeatureGrouper...
Grouping features...
 --> Group group1 successfully created.

>>> print(atom.dataset)

     radius error  texture error  perimeter error  area error  smoothness error  compactness error  concavity error  concave points error  symmetry error  ...  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)  target
0          0.8191         1.9310            4.493      103.90          0.008074           0.040880         0.053210              0.018340         0.02383  ...          0.3007                  0.08314     0.061320       1110.0    127.523316        0.176850      0.061320   329.486981       0
1          0.1859         1.9260            1.011       14.47          0.007831           0.008776         0.015560              0.006240         0.03139  ...          0.3200                  0.06576     0.015530        428.9     53.379498        0.136250      0.015530   127.109799       1
2          0.2810         0.8135            3.369       23.81          0.004929           0.066570         0.076830              0.013680         0.01526  ...          0.2845                  0.12490     0.028330        542.9     66.369889        0.141200      0.028330   160.878141       1
3          0.1639         1.1400            1.223       14.66          0.005919           0.032700         0.049570              0.010380         0.01208  ...          0.2048                  0.07628     0.028000        553.5     66.981375        0.111560      0.028000   164.121249       1
4          0.3428         0.3981            2.537       29.06          0.004732           0.015060         0.018550              0.010670         0.02163  ...          0.3109                  0.08187     0.035280        668.7     79.352913        0.120425      0.035280   198.400183       1
..            ...            ...              ...         ...               ...                ...              ...                   ...             ...  ...             ...                      ...          ...          ...           ...             ...           ...          ...     ...
564        0.2868         1.1430            2.289       20.56          0.010170           0.014430         0.018610              0.012500         0.03464  ...          0.2227                  0.06777     0.018750        334.2     43.509823        0.135500      0.018750    98.929860       1
565        0.8529         1.8490            5.632       93.54          0.010750           0.027220         0.050810              0.019110         0.02293  ...          0.2341                  0.07421     0.056990       1094.0    125.561400        0.159350      0.056990   324.783656       0
566        0.2441         2.0900            1.648       16.80          0.012910           0.022220         0.004174              0.007082         0.02572  ...          0.2262                  0.06742     0.005025        311.7     40.661394        0.139700      0.005025    92.358226       1
567        0.1998         0.6068            1.443       16.07          0.004413           0.014430         0.015090              0.007369         0.01354  ...          0.3518                  0.08665     0.031520        551.1     67.128030        0.143050      0.031520   163.348717       1
568        0.2094         0.7636            1.231       17.67          0.008725           0.020030         0.023350              0.011320         0.02625  ...          0.3380                  0.09584     0.033700        513.7     62.632288        0.136750      0.033700   152.314252       1

[569 rows x 27 columns]

>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer

>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)

Grouping features...
 --> Group group1 successfully created.

>>> print(X)

     mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  mean fractal dimension  radius error  ...  worst concave points  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)
0            122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419                 0.07871        1.0950  ...                0.2654          0.4601                  0.11890        10.38        17.99        14.185          14.185         10.38        3.805
1            132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812                 0.05667        0.5435  ...                0.1860          0.2750                  0.08902        17.77        20.57        19.170          19.170         17.77        1.400
2            130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069                 0.05999        0.7456  ...                0.2430          0.3613                  0.08758        19.69        21.25        20.470          20.470         19.69        0.780
3             77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597                 0.09744        0.4956  ...                0.2575          0.6638                  0.17300        11.42        20.38        15.900          15.900         11.42        4.480
4            135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809                 0.05883        0.7572  ...                0.1625          0.2364                  0.07678        14.34        20.29        17.315          17.315         14.34        2.975
..              ...        ...              ...               ...             ...                  ...            ...                     ...           ...  ...                   ...             ...                      ...          ...          ...           ...             ...           ...          ...
564          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726                 0.05623        1.1760  ...                0.2216          0.2060                  0.07115        21.56        22.39        21.975          21.975         21.56        0.415
565          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752                 0.05533        0.7655  ...                0.1628          0.2572                  0.06637        20.13        28.25        24.190          24.190         20.13        4.060
566          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590                 0.05648        0.4564  ...                0.1418          0.2218                  0.07820        16.60        28.08        22.340          22.340         16.60        5.740
567          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397                 0.07016        0.7260  ...                0.2650          0.4087                  0.12400        20.60        29.33        24.965          24.965         20.60        4.365
568           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587                 0.05884        0.3857  ...                0.0000          0.2871                  0.07039         7.76        24.54        16.150          16.150          7.76        8.390

[569 rows x 34 columns]

Methods

fit	Do nothing.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
inverse_transform	Do nothing.
set_output	Set output container.
set_params	Set the parameters of this estimator.
transform	Group features.

method fit(X=None, y=None, **fit_params)[source]

Do nothing.

Implemented for continuity of the API.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	self Estimator instance.

method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	dataframe Transformed feature set. Only returned if provided. series or dataframe Transformed target column. Only returned if provided.

method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns	params : dict Parameter names mapped to their values.

method inverse_transform(X=None, y=None, **fit_params)[source]

Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored.
Returns	dataframe Feature set. Only returned if provided. series or dataframe Target column(s). Only returned if provided.

method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters	transform: str or None, default=None Configure the output of the `transform`, `fit_transform`, and `inverse_transform` method. If None, the configuration is not changed. Choose from: "numpy" "pandas" (default) "pandas-pyarrow" "polars" "polars-lazy" "pyarrow" "modin" "dask" "pyspark" "pyspark-pandas"
Returns	Self Estimator instance.

method set_params(**params)[source]

Set the parameters of this estimator.

Parameters	**params : dict Estimator parameters.
Returns	self : estimator instance Estimator instance.

method transform(X, y=None)[source]

Group features.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	dataframe Transformed feature set.