FeatureSelector
Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, removes features with too low variance and finds pairs of collinear features based on the Pearson correlation coefficient. For each pair above the specified limit (in terms of absolute value), it removes one of the two. This class can be accessed from atom through the feature_selection method. Read more in the user guide.
Parameters: |
strategy: string or None, optional (default=None) Feature selection strategy to use. Choose from:
Solver or model to use for the feature selection strategy. See sklearn's documentation for an extended description of the choices. Select None for the default option per strategy (only for univariate and PCA).
Number of features to select. Choose from:
If strategy="SFM" and the threshold parameter is not specified, the
threshold is set to Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. If None, skip this step.
max_correlation: float or None, optional (default=1.) Number of cores to use for parallel processing.
Verbosity level of the class. Possible values are:
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random .
**kwargs |
Info
If strategy="PCA", the data is scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already).
Tip
Use the plot_feature_importance method to
examine how much a specific feature contributes to the final predictions. If the
model doesn't have a feature_importances_
attribute, use
plot_permutation_importance instead.
Warning
The RFE, RFECV AND SFS strategies don't work when the solver is a CatBoost model due to incompatibility of the APIs.
Attributes
Utility attributes
Attributes: |
collinear: pd.DataFrame Dataframe of the removed collinear features. Columns include:
feature_importance: list
<strategy>: sklearn estimator |
Plot attributes
Attributes: |
style: str
palette: str
title_fontsize: int
label_fontsize: int
tick_fontsize: int |
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
plot_pca | Plot the explained variance ratio vs the number of components. |
plot_components | Plot the explained variance ratio per component. |
plot_rfecv | Plot the scores obtained by the estimator on the RFECV. |
reset_aesthetics | Reset the plot aesthetics to their default values. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data. Note that the univariate, SFM (when model is not fitted), SFS, RFE and RFECV strategies all need a target column. Leaving it None will raise an exception.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame
|
Returns: |
self: FeatureSelector Fitted instance of self. |
Fit to data, then transform it. Note that the univariate, SFM (when model is not fitted), SFS, RFE and RFECV strategies need a target column. Leaving it None will raise an exception.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame
|
Returns: |
X: pd.DataFrame Transformed feature set. |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Plot the explained variance ratio vs the number of components.
See plot_pca for a description of the parameters.
Plot the explained variance ratio per components. See
plot_components for a description of the parameters.
Plot the scores obtained by the estimator fitted on every subset of the
data. See plot_rfecv for a description of the parameters.
Reset the plot aesthetics to their default values.
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: FeatureGenerator Estimator instance. |
Transform the data.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
Returns: |
X: pd.DataFrame Transformed feature set. |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.feature_selection(stratgey="pca", n_features=12, whiten=True)
atom.plot_pca(filename="pca", figsize=(8, 5))
from atom.feature_engineering import FeatureSelector
feature_selector = FeatureSelector(stratgey="pca", n_features=12, whiten=True)
feature_selector.fit(X_train, y_train)
X = feature_selector.transform(X, y)
feature_selector.plot_pca(filename="pca", figsize=(8, 5))