Balancer

class atom.data_cleaning.Balancer(strategy="ADASYN", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source]

Balance the number of samples per class in the target column. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide.

Parameters:

strategy: str or estimator, optional (default="ADASYN")
Type of algorithm with which to balance the dataset. Choose from any of the estimators in the imbalanced-learn package or provide a custom one (has to have a fit_resample method).

n_jobs: int, optional (default=1)
Number of cores to use for parallel processing.

If >0: Number of cores to use.
If -1: Use all available cores.
If <-1: Use available_cores - 1 + n_jobs.

verbose: int, optional (default=0)
Verbosity level of the class. Possible values are:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

logger: str, Logger or None, optional (default=None)

If None: Doesn't save a logging file.
If str: Name of the log file. Use "auto" for automatic naming.
Else: Python logging.Logger instance.

random_state: int or None, optional (default=None)
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by numpy.random.

**kwargs
Additional keyword arguments passed to the strategy estimator.

Tip

Use atom's classes attribute for an overview of the target class distribution per data set.

Attributes

Attributes:

<strategy>: imblearn estimator
Estimator instance (lowercase strategy) used to oversample or undersample the data, e.g. balancer.adasyn for the default strategy.

mapping: dict
Dictionary of the target values mapped to their respective encoded integer.

Methods

fit_transform	Same as transform.
get_params	Get parameters for this estimator.
log	Write information to the logger and print to stdout.
save	Save the instance to a pickle file.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method fit_transform(X, y) [source]

Oversample or undersample the data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Balanced feature set.

y: pd.Series
Balanced target column.

method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:	deep: bool, optional (default=True) If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params: dict Dictionary of the parameter names mapped to their values.

method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.

method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters:

filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.

method set_params(**params) [source]

Set the parameters of this estimator.

Parameters:	**params: dict Estimator parameters.
Returns:	self: Balancer Estimator instance.

method transform(X, y) [source]

Oversample or undersample the data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Balanced feature set.

X: pd.Series
Balanced target column.

Example

from atom import ATOMClassifier

atom = ATOMClassifier(X, y)
atom.balance(strategy="NearMiss", sampling_strategy=0.7, n_neighbors=10)

or

from atom.data_cleaning import Balancer

balancer = Balancer(strategy="NearMiss", sampling_strategy=0.7, n_neighbors=10)
X_train, y_train = balancer.transform(X_train, y_train)