Skip to content

Decomposer


class atom.data_cleaning.Decomposer(model=None, trend_model="additive", test_seasonality=True, sp=None, seasonal_model="additive", n_jobs=1, verbose=0, random_state=None)[source]

Detrend and deseasonalize the time series.

This class does two things:

  • Remove the trend from every column, returning the in-sample residuals of the model's predicted values.
  • Remove the seasonal component from every column, subject to a seasonaility test.

Categorical columns are ignored.

This class can be accessed from atom through the decompose method. Read more in the user guide.

Note

When using this class from atom, the trend_model, sp and seasonal_model parameters are set automatically based on the atom.sp attribute.

Parameters model: str, predictor or None, default=None
The forecasting model to remove the trend with. It must be a model that supports the forecast task. If None, PolynomialTrend(degree=1) is used.

trend_model: str, default="additive"
Mode of the trend decomposition. Choose from:

  • "additive": The model.transform subtracts the trend, i.e., transform(X) returns X - model.predict(fh=X.index).
  • "multiplicative": The model.transform divides by the trend, i.e., transform(X) returns X / model.predict(fh=X.index).

test_seasonality: bool, default=True

  • If True, it fits a 90% autocorrelation seasonality test, and if the passed time series has a seasonal component, it applies seasonal decomposition. If the test is negative, deseasonalization is skipped.
  • If False, always performs deseasonalization.

sp: int or None, default=None
Seasonality period of the time series. If None, there's no seasonality.

seasonal_model: str, default="additive"
Mode of the seasonal decomposition. Choose from:

  • "additive": Assumes the components have a linear relation, i.e., y(t) = level + trend + seasonality + noise.
  • "multiplicative": Assumes the components have a nonlinear relation, i.e., y(t) = level * trend * seasonality * noise.

n_jobs: int, default=1
Number of cores to use for parallel processing.

  • If >0: Number of cores to use.
  • If -1: Use all available cores.
  • If <-1: Use number of cores - 1 + n_jobs.

verbose: int, default=0
Verbosity level of the class. Choose from:

  • 0 to not print anything.
  • 1 to print basic information.
  • 2 to print detailed information.

random_state: int or None, default=None
Seed used by the random number generator. If None, the random number generator is the RandomState used by np.random.

Attributes feature_names_in_: np.ndarray
Names of features seen during fit.

n_features_in_: int
Number of features seen during fit.


See Also

Encoder

Perform encoding of categorical features.

Discretizer

Bin continuous data into intervals.

Scaler

Scale the data.


Example

>>> from atom import ATOMForecaster
>>> from sktime.datasets import load_airline

>>> y = load_airline()

>>> atom = ATOMForecaster(y, random_state=1)
>>> print(atom.y)

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
           ...  
1960-08    606.0
1960-09    508.0
1960-10    461.0
1960-11    390.0
1960-12    432.0
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64

>>> atom.decompose(columns=-1, verbose=2)

Fitting Decomposer...
Decomposing the data...

>>> print(atom.y)

Period
1949-01     17.329355
1949-02     20.763057
1949-03     32.196759
1949-04     26.630462
1949-05     16.064164
              ...    
1960-08    154.613985
1960-09     54.047688
1960-10      4.481390
1960-11    -69.084908
1960-12    -29.651205
Freq: M, Name: Number of airline passengers, Length: 144, dtype: float64
>>> from atom.data_cleaning import Decomposer
>>> from sktime.datasets import load_longley

>>> X, _ = load_longley()

>>> decomposer = Decomposer(verbose=2)
>>> X = decomposer.fit_transform(X)

Fitting Decomposer...
Decomposing the data...

>>> print(X)

             TOTEMP
Period             
1947     379.838235
1948     462.326471
1949   -1205.185294
1950    -905.697059
1951     411.791176
1952     113.279412
1953     746.767647
1954   -1197.744118
1955     343.744118
1956    1465.232353
1957    1060.720588
1958   -1311.791176
1959     113.697059
1960     306.185294
1961    -643.326471
1962    -139.838235


Methods

fitFit to data.
fit_transformFit to data, then transform it.
get_feature_names_outGet output feature names for transformation.
get_paramsGet parameters for this estimator.
inverse_transformInversely transform the data.
set_outputSet output container.
set_paramsSet the parameters of this estimator.
transformDecompose the data.


method fit(X, y=None)[source]

Fit to data.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returns Self
Estimator instance.



method fit_transform(X=None, y=None, **fit_params)[source]

Fit to data, then transform it.

Parameters X: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X. If None, y is ignored.

**fit_params
Additional keyword arguments for the fit method.

Returns dataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method get_feature_names_out(input_features=None)[source]

Get output feature names for transformation.

Parameters input_features : array-like of str or None, default=None
Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: ["x0", "x1", ..., "x(n_features_in_ - 1)"].
  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns feature_names_out : ndarray of str objects
Same as input features.



method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns params : dict
Parameter names mapped to their values.



method inverse_transform(X, y=None)[source]

Inversely transform the data.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returns dataframe
Original feature set.



method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters transform: str or None, default=None
Configure the output of the transform, fit_transform, and inverse_transform method. If None, the configuration is not changed. Choose from:

  • "numpy"
  • "pandas" (default)
  • "pandas-pyarrow"
  • "polars"
  • "polars-lazy"
  • "pyarrow"
  • "modin"
  • "dask"
  • "pyspark"
  • "pyspark-pandas"

Returns Self
Estimator instance.



method set_params(**params)[source]

Set the parameters of this estimator.

Parameters **params : dict
Estimator parameters.

Returns self : estimator instance
Estimator instance.



method transform(X, y=None)[source]

Decompose the data.

Parameters X: dataframe-like
Feature set with shape=(n_samples, n_features).

y: sequence, dataframe-like or None, default=None
Do nothing. Implemented for continuity of the API.

Returns dataframe
Transformed feature set.