Encoder
Perform encoding of categorical features. The encoding type depends on the number of classes in the column:
- If n_classes=2, use Ordinal-encoding.
- If 2 < n_classes <=
max_onehot
, use OneHot-encoding. - If n_classes >
max_onehot
, usestrategy
-encoding.
Also replaces classes with low occurrences with the value other
in order to prevent too high cardinality. An error is raised if
it encounters missing values or unknown classes when transforming.
This class can be accessed from atom through the encode
method. Read more in the user guide.
Parameters: |
strategy: str, optional (default="LeaveOneOut") Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for:
max_onehot: int or None, optional (default=10)
frac_to_other: float, optional (default=None) Verbosity level of the class. Possible values are:
Additional keyword arguments passed to the strategy estimator.
|
Tip
Use atom's categorical attribute for a list of the categorical columns in the dataset.
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
self: Encoder Fitted instance of self. |
Fit to data, then transform it.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
|
Returns: |
X: pd.DataFrame Transformed feature set. |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: Encoder Estimator instance. |
Encode the data.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
Returns: |
X: pd.DataFrame Transformed feature set. |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.encode(strategy="CatBoost", max_onehot=5)
from atom.data_cleaning import Encoder
encoder = Encoder(strategy="CatBoost", max_onehot=5)
encoder.fit(X_train, y_train)
X = encoder.transform(X)