Pruner
Replace or remove outliers. The definition of outlier depends on the selected strategy and can greatly differ from one another. Ignores categorical columns. This class can be accessed from atom through the prune method. Read more in the user guide.
Parameters: |
strategy: str or sequence, optional (default="zscore") Strategy with which to select the outliers. If sequence of strategies, only samples marked as outliers by all chosen strategies are dropped. Choose from:
Method to apply on the outliers. Only the zscore strategy accepts another method than "drop". Choose from:
max_sigma: int or float, optional (default=3)
include_target: bool, optional (default=False) Verbosity level of the class. Choose from:
Additional keyword arguments for the strategy
estimator. If sequence of strategies, the params should be provided
in a dict with the strategy's name as key.
|
Tip
Use atom's outliers attribute for an overview of the number of outlier values per column.
Attributes
Attributes: |
<strategy>: sklearn estimator Object used to prune the data, e.g. pruner.iforest for the
isolation forest strategy.
|
Methods
fit_transform | Same as transform. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Apply the outlier strategy to the data.
Parameters: |
X: dataframe-like
|
Returns: |
pd.DataFrame
pd.Series |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
dict Parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
Pruner Estimator instance. |
Apply the outlier strategy to the data.
Parameters: |
X: dataframe-like
|
Returns: |
pd.DataFrame
X: pd.Series |
Example
from atom import ATOMRegressor
atom = ATOMRegressor(X, y)
atom.prune(strategy="zscore", max_sigma=2, include_target=True)
from atom.data_cleaning import Pruner
pruner = Pruner(strategy="zscore", max_sigma=2, include_target=True)
X_train, y_train = pruner.transform(X_train, y_train)