FeatureGenerator
Use Deep feature Synthesis or a genetic algorithm to create new combinations of existing features to capture the non-linear relations between the original features. This class can be accessed from atom through the feature_generation method. Read more in the user guide.
Parameters: |
strategy: str, optional (default="DFS") Strategy to crate new features. Choose from:
n_features: int or None, optional (default=None)
generations: int, optional (default=20)
population: int, optional (default=500)
operators: str, list, tuple or None, optional (default=None) Number of cores to use for parallel processing.
Verbosity level of the class. Possible values are:
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .
|
Tip
DFS can create many new features and not all of them will be useful. Use FeatureSelector to reduce the number of features!
Warning
Using the div, log or sqrt operators can return new features with inf
or
NaN
values. Check the warnings that may pop up or use atom's
missing property.
Warning
When using DFS with n_jobs>1
, make sure to protect your code with if __name__
== "__main__"
. Featuretools uses dask, which uses python
multiprocessing for parallelization. The spawn method on multiprocessing starts
a new python process, which requires it to import the __main__ module before it
can do its task.
Attributes
Attributes: |
symbolic_transformer: SymbolicTransformer Dataframe of the newly created non-linear features. Only for the genetic strategy. Columns include:
|
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
self: FeatureGenerator Fitted instance of self. |
Fit to data, then transform it.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame Feature set with the newly generated features. |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: FeatureGenerator Estimator instance. |
Generate new features.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
Returns: |
X: pd.DataFrame Feature set with the newly generated features. |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.feature_generation(strategy="genetic", n_features=3, generations=30)
from atom.feature_engineering import FeatureGenerator
fg = FeatureGenerator(strategy="genetic", n_features=3, generations=30)
fg.fit(X_train, y_train)
X = fg.transform(X)