Pipeline

class atom.pipeline.Pipeline(steps, memory=None, verbose=0)[source]

Pipeline of transforms with a final estimator.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be transformsers, that is, they must implement fit and transform methods. The final estimator only needs to implement fit. The transformers in the pipeline can be cached using the memory parameter.

A step's estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to passthrough or None.

Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 174 (1.2%)



>>> # Apply data cleaning and feature engineering methods
>>> atom.scale()

Fitting Scaler...
Scaling features...

>>> atom.balance(strategy="smote")

Oversampling with SMOTE...
 --> Adding 116 samples to class 0.

>>> atom.feature_selection(strategy="rfe", solver="lr", n_features=22)

Fitting FeatureSelector...
Performing feature selection ...
 --> rfe selected 22 features from the dataset.
   --> Dropping feature mean texture (rank 5).
   --> Dropping feature mean smoothness (rank 4).
   --> Dropping feature mean symmetry (rank 2).
   --> Dropping feature mean fractal dimension (rank 3).
   --> Dropping feature smoothness error (rank 9).
   --> Dropping feature concavity error (rank 7).
   --> Dropping feature symmetry error (rank 6).
   --> Dropping feature worst compactness (rank 8).


>>> # Train models
>>> atom.run(models="LR")


Training ========================= >>
Models: LR
Metric: f1


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9808
Test evaluation --> f1: 0.9929
Time elapsed: 0.031s
-------------------------------------------------
Time: 0.031s


Final results ==================== >>
Total time: 0.034s
-------------------------------------
LogisticRegression --> f1: 0.9929


>>> # Get the pipeline object
>>> pipeline = atom.lr.export_pipeline()
>>> print(pipeline)

Pipeline(memory=Memory(location=None),
         steps=[('scaler',
                 Scaler(engine={'data': 'pandas', 'estimator': 'sklearn'}, verbose=2)),
                ('balancer', Balancer(strategy='smote', verbose=2)),
                ('featureselector',
                 FeatureSelector(engine={'data': 'pandas', 'estimator': 'sklearn'}, n_features=22, solver='lr_class', strategy='rfe', verbose=2)),
                ('LogisticRegression', LogisticRegression(n_jobs=1))],
         verbose=False)

Methods

decision_function	Transform, then decision_function of the final estimator.
fit	Fit the pipeline.
fit_transform	Fit the pipeline and transform the data.
get_feature_names_out	Get output feature names for transformation.
get_metadata_routing	Get metadata routing of this object.
get_params	Get parameters for this estimator.
inverse_transform	Inverse transform for each step in a reverse order.
predict	Transform, then predict of the final estimator.
predict_interval	Transform, then predict_quantiles of the final estimator.
predict_log_proba	Transform, then predict_log_proba of the final estimator.
predict_proba	Transform, then predict_proba of the final estimator.
predict_quantiles	Transform, then predict_quantiles of the final estimator.
predict_residuals	Transform, then predict_residuals of the final estimator.
predict_var	Transform, then predict_var of the final estimator.
score	Transform, then score of the final estimator.
set_output	Set output container.
set_params	Set the parameters of this estimator.
transform	Transform the data.

method decision_function(X, **params)[source]

Transform, then decision_function of the final estimator.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). **params Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Returns	np.ndarray Predicted confidence scores with shape=(n_samples,) for binary classification tasks (log likelihood ratio of the positive class) or shape=(n_samples, n_classes) for multiclass classification tasks.

method fit(X=None, y=None, **params)[source]

Fit the pipeline.

Fit all the transformers one after the other and sequentially transform the data. Finally, fit the transformed data using the final estimator.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. **params Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Returns	self Pipeline with fitted steps.

method fit_transform(X=None, y=None, **params)[source]

Fit the pipeline and transform the data.

Call fit followed by transform on each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls the transform method. Only valid if the final estimator implements transform. This also works when the final estimator is None, in which case all prior transformations are applied.

Parameters	X: dataframe-like or None, default=None Feature set with shape=(n_samples, n_features). If None, `X` is ignored. None if the estimator only uses y. y: sequence, dataframe-like or None, default=None Target column(s) corresponding to `X`. **params Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Returns	dataframe Transformed feature set. Only returned if provided. series or dataframe Transformed target column. Only returned if provided.

method get_feature_names_out(input_features=None)[source]

Get output feature names for transformation.

Parameters	input_features : array-like of str or None, default=None Input features.
Returns	feature_names_out : ndarray of str objects Transformed feature names.

method get_metadata_routing()[source]

Get metadata routing of this object.

Check sklearn's documentation on how the routing mechanism works.

Returns

MetadataRouter

A MetadataRouter encapsulating routing information.

method get_params(deep=True)[source]

Get parameters for this estimator.

Parameters	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns	params : mapping of string to any Parameter names mapped to their values.

method inverse_transform(X=None, y=None, filter_train_only=True, **params)[source]

Inverse transform for each step in a reverse order.

All estimators in the pipeline must implement the inverse_transform method.

Parameters

X: dataframe-like or None, default=None

Feature set with shape=(n_samples, n_features). If None, X is ignored. None if the pipeline only uses y.

y: sequence, dataframe-like or None, default=None

Target column(s) corresponding to X.

filter_train_only: bool, default=True

Whether to exclude transformers that should only be used on the training set.

**params

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returns

dataframe

Transformed feature set. Only returned if provided.

series or dataframe

Transformed target column. Only returned if provided.

method predict(X=None, fh=None, **params)[source]

Transform, then predict of the final estimator.

Parameters

X: dataframe-like or None, default=None

Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

fh: int, sequence or ForecastingHorizon or None, default=None

The forecasting horizon encoding the time stamps to forecast at. Only for forecast tasks.

**params

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returns

np.ndarray, series or dataframe

Predictions with shape=(n_samples,) or shape=(n_samples, n_targets) for multioutput tasks.

method predict_interval(fh, X=None, coverage=0.9)[source]

Transform, then predict_quantiles of the final estimator.

Parameters	fh: int, sequence or ForecastingHorizon The forecasting horizon encoding the time stamps to forecast at. X: dataframe-like or None, default=None Exogenous time series corresponding to `fh`. coverage: float or sequence, default=0.9 Nominal coverage(s) of predictive interval(s).
Returns	dataframe Computed interval forecasts.

method predict_log_proba(X, **params)[source]

Transform, then predict_log_proba of the final estimator.

Parameters	X: dataframe-like Feature set with shape=(n_samples, n_features). **params Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Returns	list or np.ndarray Predicted class log-probabilities with shape=(n_samples, n_classes) or a list of arrays for multioutput tasks.

method predict_proba(X=None, fh=None, marginal=True, **params)[source]

Transform, then predict_proba of the final estimator.

Parameters

X: dataframe-like or None, default=None

Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

fh: int, sequence, ForecastingHorizon or None, default=None

The forecasting horizon encoding the time stamps to forecast at. Only for forecast tasks.

marginal: bool, default=True

Whether returned distribution is marginal by time index. Only for forecast tasks.

**params

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returns

list, np.ndarray or sktime.proba.Normal

For classification tasks: Predicted class probabilities with shape=(n_samples, n_classes).
For multioutput tasks: A list of arrays with shape=(n_samples, n_classes).
For forecast tasks: Distribution object.

method predict_quantiles(fh, X=None, alpha=(0.05, 0.95))[source]

Transform, then predict_quantiles of the final estimator.

Parameters	fh: int, sequence or ForecastingHorizon The forecasting horizon encoding the time stamps to forecast at. X: dataframe-like or None, default=None Exogenous time series corresponding to `fh`. alpha: float or sequence, default=(0.05, 0.95) A probability or list of, at which quantile forecasts are computed.
Returns	dataframe Computed quantile forecasts.

method predict_residuals(y, X=None)[source]

Transform, then predict_residuals of the final estimator.

Parameters	y: sequence or dataframe Ground truth observations. X: dataframe-like or None, default=None Exogenous time series corresponding to `y`.
Returns	series or dataframe Residuals with shape=(n_samples,) or shape=(n_samples, n_targets) for multivariate tasks.

method predict_var(fh, X=None, cov=False)[source]

Transform, then predict_var of the final estimator.

Parameters	fh: int, sequence or ForecastingHorizon The forecasting horizon encoding the time stamps to forecast at. X: dataframe-like or None, default=None Exogenous time series corresponding to `fh`. cov: bool, default=False Whether to compute covariance matrix forecast or marginal variance forecasts.
Returns	dataframe Computed variance forecasts.

method score(X=None, y=None, fh=None, sample_weight=None, **params)[source]

Transform, then score of the final estimator.

Parameters

X: dataframe-like or None, default=None

Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

y: sequence, dataframe-like or None, default=None

Target values corresponding to X.

fh: int, sequence, ForecastingHorizon or None, default=None

The forecasting horizon encoding the time stamps to score.

sample_weight: sequence or None, default=None

Sample weights corresponding to y passed to the score method of the final estimator. If None, no sampling weight is performed. Only for non-forecast tasks.

Returns

float

Mean accuracy, r2 or mape of self.predict(X) with respect to y (depending on the task).

method set_output(transform=None)[source]

Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameters	transform: str or None, default=None Configure the output of the `transform`, `fit_transform`, and `inverse_transform` method. If None, the configuration is not changed. Choose from: "numpy" "pandas" (default) "pandas-pyarrow" "polars" "polars-lazy" "pyarrow" "modin" "dask" "pyspark" "pyspark-pandas"
Returns	Self Estimator instance.

method set_params(**kwargs)[source]

Set the parameters of this estimator.

Parameters	**kwargs : dict Parameters of this estimator or parameters of estimators contained in `steps`. Parameters of the steps may be set using its name and the parameter name separated by a '__'.
Returns	self : object Pipeline class instance.

method transform(X=None, y=None, filter_train_only=True, **params)[source]

Transform the data.

Call transform on each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls the transform method. Only valid if the final estimator implements transform. This also works when the final estimator is None, in which case all prior transformations are applied.

Parameters

X: dataframe-like or None, default=None

Feature set with shape=(n_samples, n_features). If None, X is ignored. None if the pipeline only uses y.

y: sequence, dataframe-like or None, default=None

Target column(s) corresponding to X.

filter_train_only: bool, default=True

Whether to exclude transformers that should only be used on the training set.

**params

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returns

dataframe

Transformed feature set. Only returned if provided.

series or dataframe

Transformed target column. Only returned if provided.