Imputer
Impute or remove missing values according to the selected strategy.
Also removes rows and columns with too many missing values. Use
the missing
attribute to customize what are considered "missing
values". This class can be accessed from atom through the
impute method. Read more in the
user guide.
Parameters: |
strat_num: str, int or float, optional (default="drop") Imputing strategy for numerical columns. Choose from:
Imputing strategy for categorical columns. Choose from:
max_nan_rows: int, float or None, optional (default=None)
max_nan_cols: int, float, optional (default=None) Verbosity level of the class. Possible values are:
|
Tip
Use atom's nans attribute for an overview of the number of missing values per column.
Attributes
Attributes: |
missing: list List of values that are considered "missing". Default values are: "", "?", "None", "NA", "nan", "NaN" and "inf". Note that None ,
NaN , +inf and -inf are always
considered missing since they are incompatible with sklearn estimators.
|
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
Returns: |
self: Imputer Fitted instance of self. |
Fit to data, then impute the missing values. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame
y: pd.Series |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: Imputer Estimator instance. |
Impute the missing values. Note that leaving y=None can lead to inconsistencies in data length between X and y if rows are dropped during the transformation.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame
y: pd.Series |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.impute(strat_num="knn", strat_cat="drop", max_nan_cols=0.8)
from atom.data_cleaning import Imputer
imputer = Imputer(strat_num="knn", strat_cat="drop", max_nan_cols=0.8)
imputer.fit(X_train, y_train)
X = imputer.transform(X)