Decomposer

class atom.data_cleaning.Decomposer(model=None, trend_model="additive", test_seasonality=True, sp=None, seasonal_model="additive", n_jobs=1, verbose=0, random_state=None)[source]

Detrend and deseasonalize the time series.

This class does two things:

Remove the trend from every column, returning the in-sample residuals of the model's predicted values.
Remove the seasonal component from every column, subject to a seasonaility test.

Categorical columns are ignored.

This class can be accessed from atom through the decompose method. Read more in the user guide.

Note

When using this class from atom, the trend_model, sp and seasonal_model parameters are set automatically based on the atom.sp attribute.

Parameters

model: str, predictor or None, default=None

The forecasting model to remove the trend with. It must be a model that supports the forecast task. If None, PolynomialTrend(degree=1) is used.

trend_model: str, default="additive"

Mode of the trend decomposition. Choose from:

"additive": The model.transform subtracts the trend, i.e., transform(X) returns X - model.predict(fh=X.index).
"multiplicative": The model.transform divides by the trend, i.e., transform(X) returns X / model.predict(fh=X.index).

test_seasonality: bool, default=True

If True, it fits a 90% autocorrelation seasonality test, and if the passed time series has a seasonal component, it applies seasonal decomposition. If the test is negative, deseasonalization is skipped.
If False, always performs deseasonalization.

sp: int or None, default=None

Seasonality period of the time series. If None, there's no seasonality.

seasonal_model: str, default="additive"

Mode of the seasonal decomposition. Choose from:

"additive": Assumes the components have a linear relation, i.e., y(t) = level + trend + seasonality + noise.
"multiplicative": Assumes the components have a nonlinear relation, i.e., y(t) = level * trend * seasonality * noise.

n_jobs: int, default=1

Number of cores to use for parallel processing.

If >0: Number of cores to use.
If -1: Use all available cores.
If <-1: Use number of cores - 1 + n_jobs.

verbose: int, default=0

Verbosity level of the class. Choose from:

0 to not print anything.
1 to print basic information.
2 to print detailed information.

random_state: int or None, default=None

Seed used by the random number generator. If None, the random number generator is the RandomState used by np.random.

Attributes

feature_names_in_: np.ndarray

Names of features seen during fit.

n_features_in_: int

Number of features seen during fit.

Example

atomstand-alone

>>> from atom import ATOMForecaster
>>> from sktime.datasets import load_airline

>>> y = load_airline()

>>> atom = ATOMForecaster(y, random_state=1)
>>> print(atom.y)

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
           ...  
1960-08    606.0
1960-09    508.0
1960-10    461.0
1960-11    390.0
1960-12    432.0
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64

>>> atom.decompose(columns=-1, verbose=2)

Fitting Decomposer...
Decomposing the data...

>>> print(atom.y)

Period
1949-01     17.329355
1949-02     20.763057
1949-03     32.196759
1949-04     26.630462
1949-05     16.064164
              ...    
1960-08    154.613985
1960-09     54.047688
1960-10      4.481390
1960-11    -69.084908
1960-12    -29.651205
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64

>>> from atom.data_cleaning import Decomposer
>>> from sktime.datasets import load_longley

>>> X, _ = load_longley()

>>> decomposer = Decomposer(verbose=2)
>>> X = decomposer.fit_transform(X)

Fitting Decomposer...
Decomposing the data...

>>> print(X)

             TOTEMP
Period             
1947     379.838235
1948     462.326471
1949   -1205.185294
1950    -905.697059
1951     411.791176
1952     113.279412
1953     746.767647
1954   -1197.744118
1955     343.744118
1956    1465.232353
1957    1060.720588
1958   -1311.791176
1959     113.697059
1960     306.185294
1961    -643.326471
1962    -139.838235

Methods

fit	Fit to data.
fit_transform	Fit to data, then transform it.
get_feature_names_out	Get output feature names for transformation.
get_params	Get parameters for this estimator.
inverse_transform	Inversely transform the data.
set_output	Set output container.
set_params	Set the parameters of this estimator.
transform	Decompose the data.

method fit(X, y=None)[source]

Fit to data.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	Self Estimator instance.

method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. If None, `y` is ignored. **fit_params Additional keyword arguments for the fit method.
Returns	dataframe Transformed feature set. Only returned if provided. series or dataframe Transformed target column. Only returned if provided.

method get_feature_names_out(input_features=None)[source]

Get output feature names for transformation.

Parameters

input_features : array-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns

feature_names_out : ndarray of str objects

Same as input features.

method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns	params : dict Parameter names mapped to their values.

method inverse_transform(X, y=None)[source]

Inversely transform the data.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	dataframe Original feature set.

method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters	transform: str or None, default=None Configure the output of the `transform`, `fit_transform`, and `inverse_transform` method. If None, the configuration is not changed. Choose from: "numpy" "pandas" (default) "pandas-pyarrow" "polars" "polars-lazy" "pyarrow" "modin" "dask" "pyspark" "pyspark-pandas"
Returns	Self Estimator instance.

method set_params(**params)[source]

Set the parameters of this estimator.

Parameters	**params : dict Estimator parameters.
Returns	self : estimator instance Estimator instance.

method transform(X, y=None)[source]

Decompose the data.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). y: sequence, dataframe-like or None, default=None Do nothing. Implemented for continuity of the API.
Returns	dataframe Transformed feature set.