Skip to content

FeatureGenerator


class atom.feature_engineering.FeatureGenerator(strategy="dfs", n_features=None, operators=None, n_jobs=1, verbose=0, random_state=None, **kwargs)[source]

Generate new features.

Create new combinations of existing features to capture the non-linear relations between the original features.

This class can be accessed from atom through the feature_generation method. Read more in the user guide.

Warning

  • Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's nans attribute.
  • When using dfs with n_jobs>1, make sure to protect your code with if __name__ == "__main__". Featuretools uses dask, which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task.
  • gfg can be slow for very large populations.

Tip

dfs can create many new features and not all of them will be useful. Use the FeatureSelector class to reduce the number of features.

Parameters strategy: str, default="dfs"
Strategy to crate new features. Choose from:

  • "dfs": Deep Feature Synthesis.
  • "gfg": Genetic Feature Generation.

n_features: int or None, default=None
Maximum number of newly generated features to add to the dataset. If None, select all created features.

operators: str, sequence or None, default=None
Mathematical operators to apply on the features. None to use all. Choose from: add, sub, mul, div, abs, sqrt, log, inv, sin, cos, tan.

n_jobs: int, default=1
Number of cores to use for parallel processing.

  • If >0: Number of cores to use.
  • If -1: Use all available cores.
  • If <-1: Use number of cores - 1 + n_jobs.

verbose: int, default=0
Verbosity level of the class. Choose from:

  • 0 to not print anything.
  • 1 to print basic information.
  • 2 to print detailed information.

random_state: int or None, default=None
Seed used by the random number generator. If None, the random number generator is the RandomState used by np.random.

**kwargs
Additional keyword arguments for the SymbolicTransformer instance. Only for the gfg strategy.

Attributes gfg_: SymbolicTransformer
Object used to calculate the genetic features. Only available when strategy="gfg".

genetic_features_: pd.DataFrame
Information on the newly created non-linear features. Only available when strategy="gfg". Columns include:

  • name: Name of the feature (generated automatically).
  • description: Operators used to create this feature.
  • fitness: Fitness score.

feature_names_in_: np.ndarray
Names of features seen during fit.

n_features_in_: int
Number of features seen during fit.


See Also

FeatureExtractor

Extract features from datetime columns.

FeatureGrouper

Extract statistics from similar features.

FeatureSelector

Reduce the number of features in the data.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> atom = ATOMClassifier(X, y)
>>> atom.feature_generation(strategy="dfs", n_features=5, verbose=2)

Fitting FeatureGenerator...
Generating new features...
 --> 5 new features were added.

>>> # Note the texture error / worst symmetry column
>>> print(atom.dataset)

     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  ...  COSINE(worst area)  concave points error * worst symmetry  concavity error / mean fractal dimension  mean fractal dimension * worst perimeter  mean texture + texture error  target
0         18.030         16.85          117.50      990.0          0.08947           0.12320  ...           -0.692809                               0.002772                                  0.514706                                  7.704740                       17.4406       0
1         19.190         15.94          126.30     1157.0          0.08694           0.11850  ...            0.921798                               0.006826                                  0.839258                                  7.588016                       16.5736       0
2         14.640         15.24           95.77      651.9          0.11320           0.13390  ...            0.797462                               0.005114                                  0.699023                                  6.942524                       15.9772       1
3          9.738         11.97           61.24      288.5          0.09250           0.04102  ...           -0.893193                               0.000000                                  0.000000                                  4.272557                       12.4660       1
4         11.890         21.17           76.39      433.8          0.09773           0.08120  ...            0.174282                               0.002852                                  0.259141                                  5.352161                       22.3730       1
..           ...           ...             ...        ...              ...               ...  ...                 ...                                    ...                                       ...                                       ...                           ...     ...
564       11.500         18.45           73.28      407.4          0.09345           0.05991  ...            0.999278                               0.002514                                  0.209808                                  4.932341                       19.2929       1
565       20.580         22.14          134.70     1290.0          0.09090           0.13480  ...           -0.928415                               0.008043                                  1.092556                                  7.952992                       23.6200       0
566       12.050         14.63           78.04      449.3          0.10310           0.09092  ...           -0.164684                               0.001555                                  0.384246                                  5.431448                       15.3594       1
567       12.830         22.33           85.26      503.2          0.10880           0.17990  ...           -0.653689                               0.005107                                  0.645575                                  7.638462                       23.3990       0
568       11.680         16.17           75.49      420.5          0.11280           0.09263  ...           -0.999773                               0.003272                                  0.289174                                  5.541346                       17.3240       1

[569 rows x 36 columns]
>>> from atom.feature_engineering import FeatureGenerator
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> fg = FeatureGenerator(strategy="dfs", n_features=5, verbose=2)
>>> X = fg.fit_transform(X, y)

Fitting FeatureGenerator...
Generating new features...
 --> 5 new features were added.

>>> # Note the radius error * worst smoothness column
>>> print(X)

       mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  ...  worst fractal dimension  mean concave points - worst fractal dimension  mean concavity * worst smoothness  mean smoothness / mean concave points  perimeter error - radius error  worst symmetry / mean compactness
index                                                                                           ...                                                                                                                                                                                                                     
0            17.99         10.38          122.80     1001.0          0.11840           0.27760  ...                  0.11890                                        0.02820                           0.048676                               0.804895                          7.4940                           1.657421
1            20.57         17.77          132.90     1326.0          0.08474           0.07864  ...                  0.08902                                       -0.01885                           0.010758                               1.207639                          2.8545                           3.496948
2            19.69         21.25          130.00     1203.0          0.10960           0.15990  ...                  0.08758                                        0.04032                           0.028505                               0.856919                          3.8394                           2.259537
3            11.42         20.38           77.58      386.1          0.14250           0.28390  ...                  0.17300                                       -0.06780                           0.050646                               1.354563                          2.9494                           2.338147
4            20.29         14.34          135.10     1297.0          0.10030           0.13280  ...                  0.07678                                        0.02752                           0.027205                               0.961649                          4.6808                           1.780120
...            ...           ...             ...        ...              ...               ...  ...                      ...                                            ...                                ...                                    ...                             ...                                ...
564          21.56         22.39          142.00     1479.0          0.11100           0.11590  ...                  0.07115                                        0.06775                           0.034390                               0.799136                          6.4970                           1.777394
565          20.13         28.25          131.20     1261.0          0.09780           0.10340  ...                  0.06637                                        0.03154                           0.016790                               0.998877                          4.4375                           2.487427
566          16.60         28.08          108.30      858.1          0.08455           0.10230  ...                  0.07820                                       -0.02518                           0.010537                               1.594681                          2.9686                           2.168133
567          20.60         29.33          140.10     1265.0          0.11780           0.27700  ...                  0.12400                                        0.02800                           0.057981                               0.775000                          5.0460                           1.475451
568           7.76         24.54           47.92      181.0          0.05263           0.04362  ...                  0.07039                                       -0.07039                           0.000000                                    inf                          2.1623                           6.581843

[569 rows x 35 columns]


Methods

fitFit to data.
fit_transformFit to data, then transform it.
get_paramsGet parameters for this estimator.
inverse_transformDo nothing.
set_outputSet output container.
set_paramsSet the parameters of this estimator.
transformGenerate new features.


method fit(X, y=None)[source]

Fit to data.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X.

Returns self
Estimator instance.



method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returns dataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns params : dict
Parameter names mapped to their values.



method inverse_transform(X=None, y=None, **fit_params)[source]

Do nothing.

Returns the input unchanged. Implemented for continuity of the API.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

Returns dataframe
Feature set. Only returned if provided.

series or dataframe
Target column(s). Only returned if provided.



method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters transform: str or None, default=None
Configure the output of the transform, fit_transform, and inverse_transform method. If None, the configuration is not changed. Choose from:

  • "numpy"
  • "pandas" (default)
  • "pandas-pyarrow"
  • "polars"
  • "polars-lazy"
  • "pyarrow"
  • "modin"
  • "dask"
  • "pyspark"
  • "pyspark-pandas"

Returns Self
Estimator instance.



method set_params(**params)[source]

Set the parameters of this estimator.

Parameters **params : dict
Estimator parameters.

Returns self : estimator instance
Estimator instance.



method transform(X, y=None)[source]

Generate new features.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returns dataframe
Transformed feature set.