FeatureGenerator

class atom.feature_engineering.FeatureGenerator(strategy="DFS", n_features=None, generations=20, population=500, operators=None, n_jobs=1, verbose=0, logger=None, random_state=None) [source]

Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide.

Parameters:

strategy: str, optional (default="DFS")
Strategy to crate new features. Choose from:

"DFS" to use Deep Feature Synthesis.
"GFG" or "genetic" to use Genetic Feature Generation.

n_features: int or None, optional (default=None)
Number of newly generated features to add to the dataset (no more than 1% of the population for the genetic strategy). If None, select all created features.

generations: int, optional (default=20)
Number of generations to evolve. Only for the genetic strategy.

population: int, optional (default=500)
Number of programs in each generation. Only for the genetic strategy.

operators: str, list, tuple or None, optional (default=None)
Name of the operators to be used on the features. None to use all. Choose from: "add", "sub", "mul", "div", "sqrt", "log", "sin", "cos", "tan".

n_jobs: int, optional (default=1)
Number of cores to use for parallel processing.

If >0: Number of cores to use.
If -1: Use all available cores.
If <-1: Use available_cores - 1 + n_jobs.

verbose: int, optional (default=0)
Verbosity level of the class. Possible values are:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

logger: str, Logger or None, optional (default=None)

If None: Doesn't save a logging file.
If str: Name of the log file. Use "auto" for automatic naming.
Else: Python logging.Logger instance.

random_state: int or None, optional (default=None)
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random.

Tip

DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features!

Warning

Using the div, log or sqrt operators can return new features with inf or NaN values. Check the warnings that may pop up or use atom's missing property.

Warning

When using DFS with n_jobs>1, make sure to protect your code with if __name__ == "__main__". Featuretools uses dask, which uses python multiprocessing for parallelization. The spawn method on multiprocessing starts a new python process, which requires it to import the __main__ module before it can do its task.

Attributes

Attributes:

symbolic_transformer: SymbolicTransformer
Instance used to calculate the genetic features. Only for the genetic strategy.

genetic_features: pd.DataFrame
Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include:

name: Name of the feature (automatically created).
description: Operators used to create this feature.
fitness: Fitness score.

Methods

fit	Fit to data.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
log	Write information to the logger and print to stdout.
save	Save the instance to a pickle file.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method fit(X, y) [source]

Fit to data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

self: FeatureGenerator
Fitted instance of self.

method fit_transform(X, y) [source]

Fit to data, then transform it.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Feature set with the newly generated features.

method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:	deep: bool, optional (default=True) If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params: dict Dictionary of the parameter names mapped to their values.

method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.

method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters:

filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.

method set_params(**params) [source]

Set the parameters of this estimator.

Parameters:	**params: dict Estimator parameters.
Returns:	self: FeatureGenerator Estimator instance.

method transform(X, y=None) [source]

Generate new features.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns:

X: pd.DataFrame
Feature set with the newly generated features.

Example

from atom import ATOMClassifier

atom = ATOMClassifier(X, y)
atom.feature_generation(strategy="genetic", n_features=3, generations=30)

or

from atom.feature_engineering import FeatureGenerator

fg = FeatureGenerator(strategy="genetic", n_features=3, generations=30)
fg.fit(X_train, y_train)
X = fg.transform(X)