Skip to content

Pipeline


class atom.pipeline.Pipeline(steps, memory=None, verbose=0)[source]
Pipeline of transforms with a final estimator.

Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be transformsers, that is, they must implement fit and transform methods. The final estimator only needs to implement fit. The transformers in the pipeline can be cached using the memory parameter.

A step's estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to passthrough or None.

Read more in sklearn's the user guide.

Info

This class behaves similarly to sklearn's pipeline, and additionally:

  • Can initialize with an empty pipeline.
  • Always returns 'pandas' objects.
  • Accepts transformers that drop rows.
  • Accepts transformers that only are fitted on a subset of the provided dataset.
  • Accepts transformers that apply only on the target column.
  • Uses transformers that are only applied on the training set to fit the pipeline, not to make predictions on new data.
  • The instance is considered fitted at initialization if all the underlying transformers/estimator in the pipeline are.
  • It returns attributes from the final estimator if they are not of the Pipeline.
  • The last estimator is also cached.
  • Supports time series models following sktime's API.

Warning

This Pipeline only works with estimators whose parameters for fit, transform, predict, etc... are named X and/or y.

Parameterssteps: list of tuple
List of (name, transform) tuples (implementing fit/transform) that are chained in sequential order.

memory: str, Memory or None, default=None
Used to cache the fitted transformers of the pipeline. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time-consuming.

verbose: int or None, default=0
Verbosity level of the transformers in the pipeline. If None, it leaves them to their original verbosity. If >0, the time elapsed while fitting each step is printed. Note this is not the same as sklearn's verbose parameter. Use the pipeline's verbose attribute to modify that one (defaults to False).

Attributesnamed_steps: Bunch
Dictionary-like object, with the following attributes. Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

classes_: np.ndarray of shape (n_classes,)
The class' labels. Only exist if the last step of the pipeline is a classifier.

feature_names_in_: np.ndarray
Names of features seen during first step fit method.

n_features_in_: int
Number of features seen during first step fit method.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 174 (1.2%)



>>> # Apply data cleaning and feature engineering methods
>>> atom.scale()

Fitting Scaler...
Scaling features...

>>> atom.balance(strategy="smote")

Oversampling with SMOTE...
 --> Adding 116 samples to class 0.

>>> atom.feature_selection(strategy="rfe", solver="lr", n_features=22)

Fitting FeatureSelector...
Performing feature selection ...
 --> rfe selected 22 features from the dataset.
   --> Dropping feature mean texture (rank 5).
   --> Dropping feature mean smoothness (rank 4).
   --> Dropping feature mean symmetry (rank 2).
   --> Dropping feature mean fractal dimension (rank 3).
   --> Dropping feature smoothness error (rank 9).
   --> Dropping feature concavity error (rank 7).
   --> Dropping feature symmetry error (rank 6).
   --> Dropping feature worst compactness (rank 8).


>>> # Train models
>>> atom.run(models="LR")


Training ========================= >>
Models: LR
Metric: f1


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9808
Test evaluation --> f1: 0.9929
Time elapsed: 0.031s
-------------------------------------------------
Time: 0.031s


Final results ==================== >>
Total time: 0.034s
-------------------------------------
LogisticRegression --> f1: 0.9929


>>> # Get the pipeline object
>>> pipeline = atom.lr.export_pipeline()
>>> print(pipeline)

Pipeline(memory=Memory(location=None),
         steps=[('scaler',
                 Scaler(engine={'data': 'pandas', 'estimator': 'sklearn'}, verbose=2)),
                ('balancer', Balancer(strategy='smote', verbose=2)),
                ('featureselector',
                 FeatureSelector(engine={'data': 'pandas', 'estimator': 'sklearn'}, n_features=22, solver='lr_class', strategy='rfe', verbose=2)),
                ('LogisticRegression', LogisticRegression(n_jobs=1))],
         verbose=False)


Methods

decision_functionTransform, then decision_function of the final estimator.
fitFit the pipeline.
fit_transformFit the pipeline and transform the data.
get_feature_names_outGet output feature names for transformation.
get_metadata_routingGet metadata routing of this object.
get_paramsGet parameters for this estimator.
inverse_transformInverse transform for each step in a reverse order.
predictTransform, then predict of the final estimator.
predict_intervalTransform, then predict_quantiles of the final estimator.
predict_log_probaTransform, then predict_log_proba of the final estimator.
predict_probaTransform, then predict_proba of the final estimator.
predict_quantilesTransform, then predict_quantiles of the final estimator.
predict_residualsTransform, then predict_residuals of the final estimator.
predict_varTransform, then predict_var of the final estimator.
scoreTransform, then score of the final estimator.
set_outputSet output container.
set_paramsSet the parameters of this estimator.
transformTransform the data.


method decision_function(X, **params)[source]
Transform, then decision_function of the final estimator.

ParametersX: dataframe-like
Feature set with shape=(n_samples, n_features).

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnsnp.ndarray
Predicted confidence scores with shape=(n_samples,) for binary classification tasks (log likelihood ratio of the positive class) or shape=(n_samples, n_classes) for multiclass classification tasks.



method fit(X=None, y=None, **params)[source]
Fit the pipeline.

Fit all the transformers one after the other and sequentially transform the data. Finally, fit the transformed data using the final estimator.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnsself
Pipeline with fitted steps.



method fit_transform(X=None, y=None, **params)[source]
Fit the pipeline and transform the data.

Call fit followed by transform on each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls the transform method. Only valid if the final estimator implements transform. This also works when the final estimator is None, in which case all prior transformations are applied.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored. None if the estimator only uses y.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnsdataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method get_feature_names_out(input_features=None)[source]
Get output feature names for transformation.

Parametersinput_features : array-like of str or None, default=None
Input features.

Returnsfeature_names_out : ndarray of str objects
Transformed feature names.



method get_metadata_routing()[source]
Get metadata routing of this object.

Check sklearn's documentation on how the routing mechanism works.

ReturnsMetadataRouter
A MetadataRouter encapsulating routing information.



method get_params(deep=True)[source]
Get parameters for this estimator.

Parametersdeep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returnsparams : mapping of string to any
Parameter names mapped to their values.



method inverse_transform(X=None, y=None, filter_train_only=True, **params)[source]
Inverse transform for each step in a reverse order.

All estimators in the pipeline must implement the inverse_transform method.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored. None if the pipeline only uses y.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X.

filter_train_only: bool, default=True
Whether to exclude transformers that should only be used on the training set.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnsdataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.



method predict(X=None, fh=None, **params)[source]
Transform, then predict of the final estimator.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

fh: int, sequence or ForecastingHorizon or None, default=None
The forecasting horizon encoding the time stamps to forecast at. Only for forecast tasks.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returnsnp.ndarray, series or dataframe
Predictions with shape=(n_samples,) or shape=(n_samples, n_targets) for multioutput tasks.



method predict_interval(fh, X=None, coverage=0.9)[source]
Transform, then predict_quantiles of the final estimator.

Parametersfh: int, sequence or ForecastingHorizon
The forecasting horizon encoding the time stamps to forecast at.

X: dataframe-like or None, default=None
Exogenous time series corresponding to fh.

coverage: float or sequence, default=0.9
Nominal coverage(s) of predictive interval(s).

Returnsdataframe
Computed interval forecasts.



method predict_log_proba(X, **params)[source]
Transform, then predict_log_proba of the final estimator.

ParametersX: dataframe-like
Feature set with shape=(n_samples, n_features).

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnslist or np.ndarray
Predicted class log-probabilities with shape=(n_samples, n_classes) or a list of arrays for multioutput tasks.



method predict_proba(X=None, fh=None, marginal=True, **params)[source]
Transform, then predict_proba of the final estimator.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

fh: int, sequence, ForecastingHorizon or None, default=None
The forecasting horizon encoding the time stamps to forecast at. Only for forecast tasks.

marginal: bool, default=True
Whether returned distribution is marginal by time index. Only for forecast tasks.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnslist, np.ndarray or sktime.proba.Normal

  • For classification tasks: Predicted class probabilities with shape=(n_samples, n_classes).
  • For multioutput tasks: A list of arrays with shape=(n_samples, n_classes).
  • For forecast tasks: Distribution object.



method predict_quantiles(fh, X=None, alpha=(0.05, 0.95))[source]
Transform, then predict_quantiles of the final estimator.

Parametersfh: int, sequence or ForecastingHorizon
The forecasting horizon encoding the time stamps to forecast at.

X: dataframe-like or None, default=None
Exogenous time series corresponding to fh.

alpha: float or sequence, default=(0.05, 0.95)
A probability or list of, at which quantile forecasts are computed.

Returnsdataframe
Computed quantile forecasts.



method predict_residuals(y, X=None)[source]
Transform, then predict_residuals of the final estimator.

Parametersy: sequence or dataframe
Ground truth observations.

X: dataframe-like or None, default=None
Exogenous time series corresponding to y.

Returnsseries or dataframe
Residuals with shape=(n_samples,) or shape=(n_samples, n_targets) for multivariate tasks.



method predict_var(fh, X=None, cov=False)[source]
Transform, then predict_var of the final estimator.

Parametersfh: int, sequence or ForecastingHorizon
The forecasting horizon encoding the time stamps to forecast at.

X: dataframe-like or None, default=None
Exogenous time series corresponding to fh.

cov: bool, default=False
Whether to compute covariance matrix forecast or marginal variance forecasts.

Returnsdataframe
Computed variance forecasts.



method score(X=None, y=None, fh=None, sample_weight=None, **params)[source]
Transform, then score of the final estimator.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). Can only be None for forecast tasks.

y: sequence, dataframe-like or None, default=None
Target values corresponding to X.

fh: int, sequence, ForecastingHorizon or None, default=None
The forecasting horizon encoding the time stamps to score.

sample_weight: sequence or None, default=None
Sample weights corresponding to y passed to the score method of the final estimator. If None, no sampling weight is performed. Only for non-forecast tasks.

Returnsfloat
Mean accuracy, r2 or mape of self.predict(X) with respect to y (depending on the task).



method set_output(transform=None)[source]
Set output container.

See sklearn's user guide on how to use the set_output API. See here a description of the choices.

Parameterstransform: str or None, default=None
Configure the output of the transform, fit_transform, and inverse_transform method. If None, the configuration is not changed. Choose from:

  • "numpy"
  • "pandas" (default)
  • "pandas-pyarrow"
  • "polars"
  • "polars-lazy"
  • "pyarrow"
  • "modin"
  • "dask"
  • "pyspark"
  • "pyspark-pandas"

ReturnsSelf
Estimator instance.



method set_params(**kwargs)[source]
Set the parameters of this estimator.

Parameters**kwargs : dict
Parameters of this estimator or parameters of estimators contained in steps. Parameters of the steps may be set using its name and the parameter name separated by a '__'.

Returnsself : object
Pipeline class instance.



method transform(X=None, y=None, filter_train_only=True, **params)[source]
Transform the data.

Call transform on each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls the transform method. Only valid if the final estimator implements transform. This also works when the final estimator is None, in which case all prior transformations are applied.

ParametersX: dataframe-like or None, default=None
Feature set with shape=(n_samples, n_features). If None, X is ignored. None if the pipeline only uses y.

y: sequence, dataframe-like or None, default=None
Target column(s) corresponding to X.

filter_train_only: bool, default=True
Whether to exclude transformers that should only be used on the training set.

**params
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Returnsdataframe
Transformed feature set. Only returned if provided.

series or dataframe
Transformed target column. Only returned if provided.