FeatureGenerator

class atom.feature_engineering.FeatureGenerator(strategy="dfs", n_features=None, operators=None, n_jobs=1, verbose=0, random_state=None, **kwargs)[source]

Generate new features.

Create new combinations of existing features to capture the non-linear relations between the original features.

This class can be accessed from atom through the feature_generation method. Read more in the user guide.

Warning

Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's nans attribute.
When using dfs with n_jobs>1, make sure to protect your code with if __name__ == "__main__". Featuretools uses dask, which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task.
gfg can be slow for very large populations.

Tip

dfs can create many new features and not all of them will be useful. Use the FeatureSelector class to reduce the number of features.

Parameters

strategy: str, default="dfs"

Strategy to crate new features. Choose from:

"dfs": Deep Feature Synthesis.
"gfg": Genetic Feature Generation.

n_features: int or None, default=None

Maximum number of newly generated features to add to the dataset. If None, select all created features.

operators: str, sequence or None, default=None

Mathematical operators to apply on the features. None to use all. Choose from: add, sub, mul, div, abs, sqrt, log, inv, sin, cos, tan.

n_jobs: int, default=1

Number of cores to use for parallel processing.

If >0: Number of cores to use.
If -1: Use all available cores.
If <-1: Use number of cores - 1 + n_jobs.

verbose: int, default=0

Verbosity level of the class. Choose from:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

random_state: int or None, default=None

Seed used by the random number generator. If None, the random number generator is the RandomState used by np.random.

**kwargs

Additional keyword arguments for the SymbolicTransformer instance. Only for the gfg strategy.

Attributes

gfg_: SymbolicTransformer

Object used to calculate the genetic features. Only available when strategy="gfg".

genetic_features_: pd.DataFrame

Information on the newly created non-linear features. Only available when strategy="gfg". Columns include:

name: Name of the feature (generated automatically).
description: Operators used to create this feature.
fitness: Fitness score.

feature_names_in_: np.ndarray

Names of features seen during fit.

n_features_in_: int

Number of features seen during fit.

Example

atomstand-alone

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_generation(strategy="dfs", n_features=5, verbose=2)

Fitting FeatureGenerator...
Generating new features...
 --> 5 new features were added.


>>> # Note the texture error / worst symmetry column
>>> print(atom.dataset)

     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  ...  worst fractal dimension  TANGENT(area error)  fractal dimension error - mean radius  mean concave points + worst texture  mean radius / smoothness error  mean symmetry * radius error  target
0          11.94         18.24           75.71      437.6          0.08261           0.04751         0.01972  ...                  0.07408            -5.165118                             -11.937365                            21.343490                     1656.033287                      0.042460       1
1          13.28         13.72           85.79      541.8          0.08363           0.08575         0.05077  ...                  0.07320            -0.480546                             -13.277387                            17.398640                     3109.342074                      0.029640       1
2          11.42         20.38           77.58      386.1          0.14250           0.28390         0.24140  ...                  0.17300            -1.720653                             -11.410792                            26.605200                     1253.567508                      0.128707       0
3          13.86         16.93           90.96      578.9          0.10260           0.15170         0.09901  ...                  0.10590             0.840327                             -13.855440                            26.986020                     2325.503356                      0.053977       0
4          20.09         23.86          134.70     1247.0          0.10800           0.18380         0.22830  ...                  0.09469            -2.215996                             -20.084072                            29.558000                     2522.601708                      0.241093       0
..           ...           ...             ...        ...              ...               ...             ...  ...                      ...                  ...                                    ...                                  ...                             ...                           ...     ...
564        13.54         14.36           87.46      566.3          0.09779           0.08129         0.06664  ...                  0.07259           514.164096                             -13.537700                            19.307810                     1600.094540                      0.050876       1
565        14.42         16.54           94.15      641.2          0.09751           0.11390         0.08007  ...                  0.08764             0.884301                             -14.416627                            21.552230                     3150.535285                      0.066748       1
566        16.35         23.29          109.00      840.4          0.09742           0.14970         0.18110  ...                  0.09614            18.817004                             -16.345915                            31.117730                     2901.508429                      0.093786       0
567        17.68         20.74          117.40      963.7          0.11150           0.16650         0.18550  ...                  0.07738            -0.351241                             -17.675032                            25.215400                     1956.401461                      0.159907       0
568        11.32         27.08           71.76      395.7          0.06883           0.03813         0.01633  ...                  0.07087            -1.071237                             -11.317948                            33.753125                     3098.822885                      0.022615       1

[569 rows x 36 columns]

>>> from atom.feature_engineering import FeatureGenerator
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGenerator(strategy="dfs", n_features=5, verbose=2)
>>> X = fg.fit_transform(X, y)

Fitting FeatureGenerator...
Generating new features...
 --> 5 new features were added.


>>> # Note the radius error * worst smoothness column
>>> print(X)

       mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  ...  worst fractal dimension  mean concave points - worst concavity  mean concavity + radius error  mean smoothness / mean perimeter  worst compactness / mean perimeter  worst concavity / mean concavity
index                                                                                           ...                                                                                                                                                                                                       
0            17.99         10.38          122.80     1001.0          0.11840           0.27760  ...                  0.11890                               -0.56480                        1.39510                          0.000964                            0.005420                          2.372209
1            20.57         17.77          132.90     1326.0          0.08474           0.07864  ...                  0.08902                               -0.17143                        0.63040                          0.000638                            0.001404                          2.780207
2            19.69         21.25          130.00     1203.0          0.10960           0.15990  ...                  0.08758                               -0.32250                        0.94300                          0.000843                            0.003265                          2.281662
3            11.42         20.38           77.58      386.1          0.14250           0.28390  ...                  0.17300                               -0.58170                        0.73700                          0.001837                            0.011167                          2.845485
4            20.29         14.34          135.10     1297.0          0.10030           0.13280  ...                  0.07678                               -0.29570                        0.95520                          0.000742                            0.001517                          2.020202
...            ...           ...             ...        ...              ...               ...  ...                      ...                                    ...                            ...                               ...                                 ...                               ...
564          21.56         22.39          142.00     1479.0          0.11100           0.11590  ...                  0.07115                               -0.27180                        1.41990                          0.000782                            0.001488                          1.683887
565          20.13         28.25          131.20     1261.0          0.09780           0.10340  ...                  0.06637                               -0.22359                        0.90950                          0.000745                            0.001465                          2.232639
566          16.60         28.08          108.30      858.1          0.08455           0.10230  ...                  0.07820                               -0.28728                        0.54891                          0.000781                            0.002857                          3.678521
567          20.60         29.33          140.10     1265.0          0.11780           0.27700  ...                  0.12400                               -0.78670                        1.07740                          0.000841                            0.006196                          2.671315
568           7.76         24.54           47.92      181.0          0.05263           0.04362  ...                  0.07039                                0.00000                        0.38570                          0.001098                            0.001345                               NaN

[569 rows x 35 columns]

Methods

fit	Fit to data.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
inverse_transform	Do nothing.
set_output	Set output container.
set_params	Set the parameters of this estimator.
transform	Generate new features.

method fit(X, y=None)[source]

Fit to data.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`.
Returns	self Estimator instance.

method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	dataframe Transformed feature set. Only returned if provided. series or dataframe Transformed target column. Only returned if provided.

method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns	params : dict Parameter names mapped to their values.

method inverse_transform(X=None, y=None, **fit_params)[source]

Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored.
Returns	dataframe Feature set. Only returned if provided. series or dataframe Target column(s). Only returned if provided.

method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters	transform: str or None, default=None Configure the output of the `transform`, `fit_transform`, and `inverse_transform` method. If None, the configuration is not changed. Choose from: "numpy" "pandas" (default) "pandas-pyarrow" "polars" "polars-lazy" "pyarrow" "modin" "dask" "pyspark" "pyspark-pandas"
Returns	Self Estimator instance.

method set_params(**params)[source]

Set the parameters of this estimator.

Parameters	**params : dict Estimator parameters.
Returns	self : estimator instance Estimator instance.

method transform(X, y=None)[source]

Generate new features.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	dataframe Transformed feature set.