FeatureSelector
Remove features according to the selected strategy. Ties between features with equal scores are broken in an unspecified way. Additionally, remove multicollinear and low variance features. This class can be accessed from atom through the feature_selection method. Read more in the user guide.
Parameters: |
strategy: str or None, optional (default=None) Feature selection strategy to use. Choose from:
Solver/model to use for the feature selection strategy. See the corresponding documentation for an extended description of the choices. If None, use the estimator's default value (only pca).
Number of features to select. Choose from:
If strategy="sfm" and the threshold parameter is not specified, the
threshold is set to Remove features with the same value in at least this fraction of the total rows. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples. If None, skip this step.
max_correlation: float or None, optional (default=1.) Number of cores to use for parallel processing.
Train strategy on GPU (instead of CPU). Only for strategy="pca".
Verbosity level of the class. Choose from:
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random .
**kwargs |
Info
If strategy="pca" and the provided data is dense, it's scaled to mean=0 and std=1 before fitting the transformer (if it wasn't already).
Tip
Use the plot_feature_importance method to
examine how much a specific feature contributes to the final predictions. If the
model doesn't have a feature_importances_
attribute, use
plot_permutation_importance instead.
Note
Be aware that, for strategy="rfecv", the n_features
parameter is the
minimum number of features to select, not the actual number of features
that the transformer returns. It may very well be that it returns more!
Attributes
Utility attributes
Attributes: |
collinear: pd.DataFrame Information on the removed collinear features. Columns include:
feature_importance: list
<strategy>: sklearn transformer
feature_names_in_: np.array
n_features_in_: int |
Plot attributes
Attributes: |
style: str
palette: str
title_fontsize: int
label_fontsize: int
tick_fontsize: int |
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
plot_pca | Plot the explained variance ratio vs the number of components. |
plot_components | Plot the explained variance ratio per component. |
plot_rfecv | Plot the scores obtained by the estimator on the rfecv. |
reset_aesthetics | Reset the plot aesthetics to their default values. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data. Note that the univariate, sfm (when model is not fitted), sfs, RFE and rfecv strategies all need a target column. Leaving it None will raise an exception.
Parameters: |
X: dataframe-like
|
Returns: |
FeatureSelector Fitted instance of self. |
Fit to data, then transform it. Note that the univariate, sfm (when model is not fitted), sfs, RFE and rfecv strategies need a target column. Leaving it None will raise an exception.
Parameters: |
X: dataframe-like
|
Returns: |
pd.DataFrame Transformed feature set. |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
dict Parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Plot the explained variance ratio vs the number of components.
See plot_pca for a description of the parameters.
Plot the explained variance ratio per components. See
plot_components for a description of the parameters.
Plot the scores obtained by the estimator fitted on every subset of the
data. See plot_rfecv for a description of the parameters.
Reset the plot aesthetics to their default values.
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
FeatureGenerator Estimator instance. |
Transform the data.
Parameters: |
X: dataframe-like
y: int, str, sequence or None, optional (default=None) |
Returns: |
pd.DataFrame Transformed feature set. |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.feature_selection(strategy="pca", n_features=12, whiten=True)
atom.plot_pca(filename="pca", figsize=(8, 5))
from atom.feature_engineering import FeatureSelector
feature_selector = FeatureSelector(strategy="pca", n_features=12, whiten=True)
feature_selector.fit(X_train, y_train)
X = feature_selector.transform(X, y)
feature_selector.plot_pca(filename="pca", figsize=(8, 5))