Encoder

class atom.data_cleaning.Encoder(strategy="LeaveOneOut", max_onehot=10, frac_to_other=None, verbose=0, logger=None, **kwargs) [source]

Perform encoding of categorical features. The encoding type depends on the number of classes in the column:

If n_classes=2, use Ordinal-encoding.
If 2 < n_classes <= max_onehot, use OneHot-encoding.
If n_classes > max_onehot, use strategy-encoding.

Also replaces classes with low occurrences with the value other in order to prevent too high cardinality. An error is raised if it encounters missing values or unknown classes when transforming. This class can be accessed from atom through the encode method. Read more in the user guide.

Parameters:

strategy: str, optional (default="LeaveOneOut")
Type of encoding to use for high cardinality features. Choose from one of the estimators available in the category-encoders package except for:

OneHotEncoder: Use the max_onehot parameter.
HashingEncoder: Incompatibility of APIs.

max_onehot: int or None, optional (default=10)
Maximum number of unique values in a feature to perform one-hot-encoding. If None, it will always use strategy when n_unique > 2.

frac_to_other: float, optional (default=None)
Classes with less occurrences than n_rows * frac_to_other are replaced with the string other. If None, skip this step.

verbose: int, optional (default=0)
Verbosity level of the class. Possible values are:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

logger: str, Logger or None, optional (default=None)

If None: Doesn't save a logging file.
If str: Name of the log file. Use "auto" for automatic naming.
Else: Python logging.Logger instance.

**kwargs
Additional keyword arguments passed to the strategy estimator.

Tip

Use atom's categorical attribute for a list of the categorical columns in the dataset.

Methods

fit	Fit to data.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
log	Write information to the logger and print to stdout.
save	Save the instance to a pickle file.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method fit(X, y) [source]

Fit to data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

self: Encoder
Fitted instance of self.

method fit_transform(X, y) [source]

Fit to data, then transform it.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Transformed feature set.

method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:	deep: bool, optional (default=True) If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params: dict Dictionary of the parameter names mapped to their values.

method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.

method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters:

filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.

method set_params(**params) [source]

Set the parameters of this estimator.

Parameters:	**params: dict Estimator parameters.
Returns:	self: Encoder Estimator instance.

method transform(X, y=None) [source]

Encode the data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns:

X: pd.DataFrame
Transformed feature set.

Example

from atom import ATOMClassifier

atom = ATOMClassifier(X, y)
atom.encode(strategy="CatBoost", max_onehot=5)

or

from atom.data_cleaning import Encoder

encoder = Encoder(strategy="CatBoost", max_onehot=5)
encoder.fit(X_train, y_train)
X = encoder.transform(X)