Skip to content

Gauss


class atom.data_cleaning.Gauss(strategy="yeo-johnson", verbose=0, logger=None, **kwargs) [source]

Transform the data to follow a Gaussian distribution. This transformation is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired. Missing values are disregarded in fit and maintained in transform. Categorical columns are ignored. This class can be accessed from atom through the gauss method. Read more in the user guide.

Parameters: strategy: str, optional (default="yeo-johnson")
The transforming strategy. Choose from: verbose: int, optional (default=0)
Verbosity level of the class. Possible values are:
  • 0 to not print anything.
  • 1 to print basic information.
logger: str, Logger or None, optional (default=None)
  • If None: Doesn't save a logging file.
  • If str: Name of the log file. Use "auto" for automatic naming.
  • Else: Python logging.Logger instance.
random_state: int or None, optional (default=None)
Seed used by the quantile strategy. If None, the random number generator is the RandomState used by numpy.random.

**kwargs
Additional keyword arguments passed to the strategy estimator.

Info

The yeo-johnson and box-cox strategies apply zero-mean, unit-variance normalization after transforming. Use the kwargs parameter to change this behaviour.

Tip

Use atom's plot_distribution method to visualize the transformation.

Warning

Note that the quantile strategy performs a non-linear transformation. This may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.


Attributes

Attributes: estimator: sklearn estimator
Estimator's instance with which the data is transformed.


Methods

fit Fit to data.
fit_transform Fit to data, then transform it.
get_params Get parameters for this estimator.
log Write information to the logger and print to stdout.
save Save the instance to a pickle file.
set_params Set the parameters of this estimator.
transform Transform the data.


method fit(X, y=None) [source]

Fit to data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns: self: Gauss
Fitted instance of self.


method fit_transform(X, y=None) [source]

Fit to data, then transform it.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns: X: pd.DataFrame
Scaled feature set.


method get_params(deep=True) [source]

Get parameters for this estimator.

Parameters:

deep: bool, optional (default=True)
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns: params: dict
Dictionary of the parameter names mapped to their values.


method log(msg, level=0) [source]

Write a message to the logger and print it to stdout.

Parameters:

msg: str
Message to write to the logger and print to stdout.

level: int, optional (default=0)
Minimum verbosity level to print the message.


method save(filename="auto") [source]

Save the instance to a pickle file.

Parameters: filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.


method set_params(**params) [source]

Set the parameters of this estimator.

Parameters: **params: dict
Estimator parameters.
Returns: self: Gauss
Estimator instance.


method transform(X, y=None) [source]

Apply the transformations to the data.

Parameters:

X: dict, list, tuple, np.ndarray or pd.DataFrame
Feature set with shape=(n_samples, n_features).

y: int, str, sequence or None, optional (default=None)
Does nothing. Implemented for continuity of the API.

Returns: X: pd.DataFrame
Transformed feature set.


Example

from atom import ATOMRegressor

atom = ATOMRegressor(X, y)
atom.gauss()
or
from atom.data_cleaning import Gauss

gauss = Gauss()
gauss.fit(X_train)
X = gauss.transform(X)

Back to top