Encoder
Perform encoding of categorical features. The encoding type depends on the number of classes in the column:
- If n_classes=2 or ordinal feature, use Ordinal-encoding.
- If 2 < n_classes <=
max_onehot
, use OneHot-encoding. - If n_classes >
max_onehot
, usestrategy
-encoding.
Missing values are propagated to the output column. Unknown
classes encountered during transforming are imputed according
to the selected strategy. Classes with low occurrences can be
replaced with the value other
in order to prevent too high
cardinality. It can be accessed from atom through the
encode method. Read more
in the user guide.
Parameters: |
strategy: str or estimator, optional (default="LeaveOneOut") Type of encoding to use for high cardinality features. Choose from any of the estimators in the category-encoders package or provide a custom one.
max_onehot: int or None, optional (default=10)
ordinal: dict or None, optional (default=None) Replaces rare occurrences in categorical columns with the string other . This transformation is done before the encoding
of the column.
Verbosity level of the class. Choose from:
Additional keyword arguments for the strategy estimator.
|
Tip
Use atom's categorical attribute for a list of the categorical columns in the dataset.
Warning
Two category-encoders estimators are unavailable:
- OneHotEncoder: Use the
max_onehot
parameter. - HashingEncoder: Incompatibility of APIs.
Attributes
Attributes: |
mapping: dict of dicts
feature_names_in_: np.array
n_features_in_: int |
Methods
fit | Fit to data. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the data. |
Fit to data. Note that leaving y=None can lead to errors if the
strategy
encoder requires target values.
Parameters: |
X: dataframe-like
|
Returns: |
Encoder Fitted instance of self. |
Fit to data, then transform it. Note that leaving y=None can lead
to errors if the strategy
encoder requires target values.
Parameters: |
X: dataframe-like
|
Returns: |
pd.DataFrame Transformed feature set. |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
dict Parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
Encoder Estimator instance. |
Encode the data.
Parameters: |
X: dataframe-like
y: int, str, sequence or None, optional (default=None) |
Returns: |
pd.DataFrame Transformed feature set. |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.encode(strategy="CatBoost", max_onehot=5)
from atom.data_cleaning import Encoder
encoder = Encoder(strategy="CatBoost", max_onehot=5)
encoder.fit(X_train, y_train)
X = encoder.transform(X)