Balancer

class atom.data_cleaning.Balancer(strategy="ADASYN", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs) [source]

Balance the number of samples per class in the target column. When oversampling, the newly created samples have an increasing integer index for numerical indices, and an index of the form [estimator]_N for non-numerical indices, where N stands for the N-th sample in the data set. Use only for classification tasks. This class can be accessed from atom through the balance method. Read more in the user guide.

Parameters:

strategy: str or estimator, optional (default="ADASYN")
Type of algorithm with which to balance the dataset. Choose from any of the estimators in the imbalanced-learn package or provide a custom one (has to have a fit_resample method).

n_jobs: int, optional (default=1)
Number of cores to use for parallel processing.

If >0: Number of cores to use.
If -1: Use all available cores.
If <-1: Use available_cores - 1 + n_jobs.

verbose: int, optional (default=0)
Verbosity level of the class. Choose from:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

logger: str, Logger or None, optional (default=None)

If None: Doesn't save a logging file.
If str: Name of the log file. Use "auto" for automatic naming.
Else: Python logging.Logger instance.

random_state: int or None, optional (default=None)
Seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.

**kwargs
Additional keyword arguments for the strategy estimator.

Tip

Use atom's classes attribute for an overview of the target class distribution per data set.

Warning

The clustercentroids estimator is unavailable because of incompatibilities of the APIs.

Attributes

Attributes:

<strategy>: imblearn estimator
Object (lowercase strategy) used to balance the data, e.g. balancer.adasyn for the default strategy.

mapping: dict
Target values mapped to their respective encoded integer.

Methods

fit_transform	Same as transform.
get_params	Get parameters for this estimator.
log	Write information to the logger and print to stdout.
save	Save the instance to a pickle file.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method fit_transform(X, y) [source]

Balance the data.

Parameters:

X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Balanced feature set.

y: pd.Series
Balanced target column.

method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:	deep: bool, optional (default=True) If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	dict Parameter names mapped to their values.

method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.

method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters:

filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.

method set_params(**params) [source]

Set the parameters of this estimator.

Parameters:	**params: dict Estimator parameters.
Returns:	Balancer Estimator instance.

method transform(X, y) [source]

Balance the data.

Parameters:

X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: int, str or sequence

If int: Index of the target column in X.
If str: Name of the target column in X.
Else: Target column with shape=(n_samples,).

Returns:

X: pd.DataFrame
Balanced feature set.

X: pd.Series
Balanced target column.

Example

atomstand-alone

from atom import ATOMClassifier

atom = ATOMClassifier(X, y)
atom.balance(strategy="NearMiss", sampling_strategy=0.7, n_neighbors=10)

from atom.data_cleaning import Balancer

balancer = Balancer(strategy="NearMiss", sampling_strategy=0.7, n_neighbors=10)
X_train, y_train = balancer.transform(X_train, y_train)