Cleaner
Performs standard data cleaning steps on a dataset. Use the parameters to choose which transformations to perform. The available steps are:
- Drop columns with specific data types.
- Strip categorical features from white spaces.
- Drop categorical columns with maximal cardinality.
- Drop columns with minimum cardinality.
- Drop duplicate rows.
- Drop rows with missing values in the target column.
- Encode the target column.
This class can be accessed from atom through the clean method. Read more in the user guide.
Parameters: |
drop_types: str, sequence or None, optional (default=None)
strip_categorical: bool, optional (default=True)
drop_max_cardinality: bool, optional (default=True)
drop_min_cardinality: bool, optional (default=True)
drop_duplicates: bool, optional (default=False)
drop_missing_target: bool, optional (default=True)
encode_target: bool, optional (default=True) Verbosity level of the class. Possible values are:
|
Attributes
Attributes: |
missing: list
mapping: dict |
Methods
fit_transform | Same as transform. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Apply the data cleaning steps to the data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame
y: pd.Series |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: Cleaner Estimator instance. |
Apply the data cleaning steps to the data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame
y: pd.Series |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.clean(maximum_cardinality=False)
from atom.data_cleaning import Cleaner
cleaner = Cleaner(maximum_cardinality=False)
X, y = cleaner.transform(X, y)