FeatureGrouper

class atom.feature_engineering.FeatureGrouper(groups, operators=None, drop_columns=True, verbose=0)[source]

Extract statistics from similar features.

Replace groups of features with related characteristics with new features that summarize statistical properties of the group. The statistical operators are calculated over every row of the group. The group names and features can be accessed through the groups method.

This class can be accessed from atom through the feature_grouping method. Read more in the user guide.

Parameters

groups: dict

Group names and features. A feature can belong to multiple groups.

operators: str, sequence or None, default=None

Statistical operators to apply on the groups. Any operator from numpy or scipy.stats (checked in that order) that is applied on an array can be used. If None, it uses: min, max, mean, median, mode and std.

drop_columns: bool, default=True

Whether to drop the columns in groups after transformation.

verbose: int, default=0

Verbosity level of the class. Choose from:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

Attributes

feature_names_in_: np.ndarray

Names of features seen during fit.

n_features_in_: int

Number of features seen during fit.

Example

atomstand-alone

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_grouping({"group1": "mean.*"}, verbose=2)

Fitting FeatureGrouper...
Grouping features...
 --> Group group1 successfully created.


>>> print(atom.dataset)

     radius error  texture error  perimeter error  area error  smoothness error  compactness error  concavity error  concave points error  symmetry error  ...  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)  target
0          0.2241         1.5080            1.553       9.833          0.010190            0.01084         0.000000              0.000000         0.02659  ...          0.2932                  0.09382      0.00000        143.5     20.816486        0.155000       0.00000    42.901883       1
1          0.4226         1.1500            2.735      40.090          0.003659            0.02855         0.025720              0.012720         0.01817  ...          0.3698                  0.10940      0.06758        656.9     78.948101        0.169600       0.06758   194.679737       0
2          0.2222         0.8652            1.444      17.120          0.005517            0.01727         0.020450              0.006747         0.01616  ...          0.2535                  0.07993      0.01393        428.0     53.470220        0.121005       0.01393   126.804600       1
3          0.4709         0.9951            2.903      53.160          0.005654            0.02199         0.030590              0.014990         0.01623  ...          0.3590                  0.07787      0.05892       1145.0    131.053336        0.175950       0.05892   340.023830       0
4          0.2387         0.6372            1.729      21.830          0.003958            0.01246         0.018310              0.008747         0.01500  ...          0.2778                  0.07012      0.04528        800.0     93.581249        0.134225       0.04528   237.441648       1
..            ...            ...              ...         ...               ...                ...              ...                   ...             ...  ...             ...                      ...          ...          ...           ...             ...           ...          ...     ...
564        0.2545         0.9832            2.110      21.050          0.004452            0.03055         0.026810              0.013520         0.01454  ...          0.4264                  0.12750      0.06924        644.8     77.913697        0.206000       0.06924   191.109014       0
565        0.3460         1.3360            2.066      31.240          0.005868            0.02099         0.020210              0.009064         0.02087  ...          0.2642                  0.06953      0.02443        573.2     69.767851        0.129430       0.02443   169.767485       1
566        0.1153         0.6745            0.757       9.006          0.003265            0.00493         0.006493              0.003762         0.01720  ...          0.2901                  0.06783      0.01053        476.5     58.291694        0.116930       0.01053   141.291563       1
567        0.3473         0.9209            2.244      32.190          0.004766            0.02374         0.023840              0.008637         0.01772  ...          0.3993                  0.10640      0.06303        761.3     90.055875        0.156950       0.06303   225.756961       0
568        0.3438         1.1400            2.225      25.060          0.005463            0.01964         0.020790              0.005398         0.01477  ...          0.2779                  0.08121      0.01638        431.9     53.666299        0.137785       0.01638   127.999281       1

[569 rows x 27 columns]

>>> from atom.feature_engineering import FeatureGrouper
>>> from sklearn.datasets import load_breast_cancer

>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGrouper({"group1": ["mean texture", "mean radius"]}, verbose=2)
>>> X = fg.transform(X)

Grouping features...
 --> Group group1 successfully created.


>>> print(X)

     mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  mean fractal dimension  radius error  ...  worst concave points  worst symmetry  worst fractal dimension  min(group1)  max(group1)  mean(group1)  median(group1)  mode(group1)  std(group1)
0            122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419                 0.07871        1.0950  ...                0.2654          0.4601                  0.11890        10.38        17.99        14.185          14.185         10.38        3.805
1            132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812                 0.05667        0.5435  ...                0.1860          0.2750                  0.08902        17.77        20.57        19.170          19.170         17.77        1.400
2            130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069                 0.05999        0.7456  ...                0.2430          0.3613                  0.08758        19.69        21.25        20.470          20.470         19.69        0.780
3             77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597                 0.09744        0.4956  ...                0.2575          0.6638                  0.17300        11.42        20.38        15.900          15.900         11.42        4.480
4            135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809                 0.05883        0.7572  ...                0.1625          0.2364                  0.07678        14.34        20.29        17.315          17.315         14.34        2.975
..              ...        ...              ...               ...             ...                  ...            ...                     ...           ...  ...                   ...             ...                      ...          ...          ...           ...             ...           ...          ...
564          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726                 0.05623        1.1760  ...                0.2216          0.2060                  0.07115        21.56        22.39        21.975          21.975         21.56        0.415
565          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752                 0.05533        0.7655  ...                0.1628          0.2572                  0.06637        20.13        28.25        24.190          24.190         20.13        4.060
566          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590                 0.05648        0.4564  ...                0.1418          0.2218                  0.07820        16.60        28.08        22.340          22.340         16.60        5.740
567          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397                 0.07016        0.7260  ...                0.2650          0.4087                  0.12400        20.60        29.33        24.965          24.965         20.60        4.365
568           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587                 0.05884        0.3857  ...                0.0000          0.2871                  0.07039         7.76        24.54        16.150          16.150          7.76        8.390

[569 rows x 34 columns]

Methods

fit	Do nothing.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
inverse_transform	Do nothing.
set_output	Set output container.
set_params	Set the parameters of this estimator.
transform	Group features.

method fit(X=None, y=None, **fit_params)[source]

Do nothing.

Implemented for continuity of the API.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	self Estimator instance.

method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	dataframe Transformed feature set. Only returned if provided. series or dataframe Transformed target column. Only returned if provided.

method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns	params : dict Parameter names mapped to their values.

method inverse_transform(X=None, y=None, **fit_params)[source]

Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored.
Returns	dataframe Feature set. Only returned if provided. series or dataframe Target column(s). Only returned if provided.

method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters	transform: str or None, default=None Configure the output of the `transform`, `fit_transform`, and `inverse_transform` method. If None, the configuration is not changed. Choose from: "numpy" "pandas" (default) "pandas-pyarrow" "polars" "polars-lazy" "pyarrow" "modin" "dask" "pyspark" "pyspark-pandas"
Returns	Self Estimator instance.

method set_params(**params)[source]

Set the parameters of this estimator.

Parameters	**params : dict Estimator parameters.
Returns	self : estimator instance Estimator instance.

method transform(X, y=None)[source]

Group features.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	dataframe Transformed feature set.