Gauss

class atom.data_cleaning.Gauss(strategy="yeo-johnson", verbose=0, logger=None, **kwargs) [source]

Transform the data to follow a Gaussian distribution. This transformation is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired. Missing values are disregarded in fit and maintained in transform. Categorical columns are ignored. This class can be accessed from atom through the gauss method. Read more in the user guide.

Parameters:

strategy: str, optional (default="yeo-johnson")
The transforming strategy. Choose from:

yeo-johnson
box-cox (only works with strictly positive values)
quantile (non-linear transformation)

verbose: int, optional (default=0)
Verbosity level of the class. Possible values are:

0 to not print anything.
1 to print basic information.

logger: str, Logger or None, optional (default=None)

If None: Doesn't save a logging file.
If str: Name of the log file. Use "auto" for automatic naming.
Else: Python logging.Logger instance.

random_state: int or None, optional (default=None)
Seed used by the quantile strategy. If None, the random number generator is the RandomState used by numpy.random.

**kwargs
Additional keyword arguments passed to the strategy estimator.

Info

The yeo-johnson and box-cox strategies apply zero-mean, unit-variance normalization after transforming. Use the kwargs parameter to change this behaviour.

Tip

Use atom's plot_distribution method to visualize the transformation.

Warning

Note that the quantile strategy performs a non-linear transformation. This may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.

Attributes

Attributes:

estimator: sklearn estimator
Estimator's instance with which the data is transformed.

Methods

fit	Fit to data.
fit_transform	Fit to data, then transform it.
get_params	Get parameters for this estimator.
log	Write information to the logger and print to stdout.
save	Save the instance to a pickle file.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method fit(X, y=None) [source]

Fit to data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns:

self: Gauss
Fitted instance of self.

method fit_transform(X, y=None) [source]

Fit to data, then transform it.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns:

X: pd.DataFrame
Scaled feature set.

method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:	deep: bool, optional (default=True) If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params: dict Dictionary of the parameter names mapped to their values.

method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.

method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters:

filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.

method set_params(**params) [source]

Set the parameters of this estimator.

Parameters:	**params: dict Estimator parameters.
Returns:	self: Gauss Estimator instance.

method transform(X, y=None) [source]

Apply the transformations to the data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns:

X: pd.DataFrame
Transformed feature set.

Example

from atom import ATOMRegressor

atom = ATOMRegressor(X, y)
atom.gauss()

or

from atom.data_cleaning import Gauss

gauss = Gauss()
gauss.fit(X_train)
X = gauss.transform(X)