PolynomialTrend
PT native multioutput
Forecast time series data with a polynomial trend, using a sklearn LinearRegression class to regress values of time series on index, after extraction of polynomial features.
Corresponding estimators are:
- PolynomialTrendForecaster for forecasting tasks.
See Also
Example
>>> from atom import ATOMForecaster
>>> from sktime.datasets import load_airline
>>> y = load_airline()
>>> atom = ATOMForecaster(y, random_state=1)
>>> atom.run(models="PT", verbose=2)
Training ========================= >>
Models: PT
Metric: mape
Results for PolynomialTrend:
Fit ---------------------------------------------
Train evaluation --> mape: -0.1196
Test evaluation --> mape: -0.1181
Time elapsed: 0.019s
-------------------------------------------------
Time: 0.019s
Final results ==================== >>
Total time: 0.020s
-------------------------------------
PolynomialTrend --> mape: -0.1181
Hyperparameters
Parameters |
degree IntDistribution(high=5, log=False, low=1, step=1)
with_interceptCategoricalDistribution(choices=(True, False)) |
Attributes
Data attributes
Attributes |
pipeline: Pipeline Pipeline of transformers.
mapping: dict[str, dict[str, int | float]]Models that used automated feature scaling have the scaler added. Tip Use the plot_pipeline method to visualize the pipeline. Encoded values and their respective mapped values.
dataset: pd.DataFrameThe column name is the key to its mapping dictionary. Only for columns mapped to a single column (e.g., Ordinal, Leave-one-out, etc...). Complete data set.
train: pd.DataFrameTraining set.
test: pd.DataFrameTest set.
X: pd.DataFrameFeature set.
y: pd.Series | pd.DataFrameTarget column(s).
X_train: pd.DataFrameFeatures of the training set.
y_train: pd.Series | pd.DataFrameTarget column of the training set.
X_test: pd.DataFrameFeatures of the test set.
y_test: pd.Series | pd.DataFrameTarget column(s) of the test set.
X_holdout: pd.DataFrame | NoneFeatures of the holdout set.
y_holdout: pd.Series | pd.DataFrame | NoneTarget column of the holdout set.
shape: tuple[Int, Int]Shape of the dataset (n_rows, n_columns).
columns: pd.IndexName of all the columns.
n_columns: intNumber of columns.
features: pd.IndexName of the features.
n_features: intNumber of features.
target: str | list[str]Name of the target column(s).
|
Utility attributes
Attributes |
name: str Name of the model.
run: RunUse the property's Mlflow run corresponding to this model.
study: StudyThis property is only available for models that with mlflow tracking enabled. Optuna study used for hyperparameter tuning.
trials: pd.DataFrameThis property is only available for models that ran hyperparameter tuning. Overview of the trials' results.
best_trial: FrozenTrialThis property is only available for models that ran hyperparameter tuning. All durations are in seconds. Columns include:
Trial that returned the highest score.
best_params: dict[str, Any]For multi-metric runs, the best trial is the trial that
performed best on the main metric. Use the property's Estimator's parameters in the best trial.
estimator: PredictorThis property is only available for models that ran hyperparameter tuning. Estimator fitted on the training set.
bootstrap: pd.DataFrameOverview of the bootstrapping scores.
results: pd.SeriesThe dataframe has shape=(n_bootstrap, metric) and shows the
score obtained by every bootstrapped sample for every metric.
Using Overview of the model results.
feature_importance: pd.SeriesAll durations are in seconds. Possible values include:
Normalized feature importance scores.
The sum of importances for all features is 1. The scores are
extracted from the estimator's |
Methods
The plots can be called directly from the model. The remaining utility methods can be found hereunder.
bootstrapping | Apply a bootstrap algorithm. |
calibrate | Calibrate and retrain the model. |
canvas | Create a figure with multiple plots. |
clear | Reset attributes and clear cache from the model. |
create_app | Create an interactive app to test model predictions. |
create_dashboard | Create an interactive dashboard to analyze the model. |
cross_validate | Evaluate the model using cross-validation. |
evaluate | Get the model's scores for the provided metrics. |
export_pipeline | Export the transformer pipeline with final estimator. |
fit | Fit and validate the model. |
full_train | Train the estimator on the complete dataset. |
get_best_threshold | Get the threshold that maximizes a metric. |
get_tags | Get the model's tags. |
hyperparameter_tuning | Run the hyperparameter tuning algorithm. |
inverse_transform | Inversely transform new data through the pipeline. |
predict | Get predictions on new data or existing rows. |
predict_interval | Get prediction intervals on new data or existing rows. |
predict_proba | Get probabilistic forecasts on new data or existing rows. |
predict_quantiles | Get quantile forecasts on new data or existing rows. |
predict_residuals | Get residuals of forecasts on new data or existing rows. |
predict_var | Get variance forecasts on new data or existing rows. |
register | Register the model in mlflow's model registry. |
reset_aesthetics | Reset the plot aesthetics to their default values. |
save_estimator | Save the estimator to a pickle file. |
score | Get a metric score on new data. |
serve | Serve the model as rest API endpoint for inference. |
set_threshold | Set the binary threshold of the estimator. |
transform | Transform new data through the pipeline. |
update_layout | Update the properties of the plot's layout. |
update_traces | Update the properties of the plot's traces. |
Apply a bootstrap algorithm.
Take bootstrapped samples from the training set and test them on the test set to get a distribution of the model's results.
Parameters |
n_bootstrap: int
umber of bootstrapped samples to fit on.
reset: bool, default=False
Whether to start a new run or continue the existing one.
|
Calibrate and retrain the model.
Uses sklearn's CalibratedClassifierCV to apply probability
calibration on the model. The new classifier replaces the
estimator
attribute. If there is an active mlflow experiment,
a new run is started using the name [model_name]_calibrate
.
Since the estimator changed, the model is cleared.
Only for classifiers.
Note
By default, the calibration is optimized using the training
set (which is already used for the initial training). This
approach is subject to undesired overfitting. It's preferred
to use train_on_test=True
, which uses the test set for
calibration, but only if there is another, independent set
for testing (holdout set).
Parameters |
method: str, default="sigmoid"
The method to use for calibration. Choose from:
train_on_test: bool, default=False
Whether to train the calibrator on the test set.
|
Create a figure with multiple plots.
This @contextmanager
allows you to draw many plots in one
figure. The default option is to add two plots side by side.
See the user guide for an example.
Parameters |
rows: int, default=1
Number of plots in length.
cols: int, default=2
Number of plots in width.
sharex: bool, default=False
If True, hide the label and ticks from non-border subplots
on the x-axis.
sharey: bool, default=False
If True, hide the label and ticks from non-border subplots
on the y-axis.
hspace: float, default=0.05
Space between subplot rows in normalized plot coordinates.
The spacing is relative to the figure's size.
vspace: float, default=0.07
Space between subplot cols in normalized plot coordinates.
The spacing is relative to the figure's size.
title: str, dict or None, default=None
Title for the plot.
legend: bool, str or dict, default="out"
Legend for the plot. See the user guide for
an extended description of the choices.
figsize: tuple or None, default=None
Figure's size in pixels, format as (x, y). If None, it
adapts the size to the number of plots in the canvas.
filename: str, Path or None, default=None
Save the plot using this name. Use "auto" for automatic
naming. The type of the file depends on the provided name
(.html, .png, .pdf, etc...). If
display: bool, default=Truefilename has no file type,
the plot is saved as html. If None, the plot is not saved.
Whether to render the plot.
|
Yields | {#canvas-go.Figure}
go.Figure
Plot object.
|
Reset attributes and clear cache from the model.
Reset certain model attributes to their initial state, deleting potentially large data arrays. Use this method to free some memory before saving the instance. The affected attributes are:
- In-training validation scores
- Cached predictions.
- Shap values
- App instance
- Dashboard instance
- Calculated holdout data sets
Create an interactive app to test model predictions.
Demo your machine learning model with a friendly web interface.
This app launches directly in the notebook or on an external
browser page. The created Interface instance can be accessed
through the app
attribute.
Parameters |
**kwargs
Additional keyword arguments for the Interface instance
or the Interface.launch method.
|
Create an interactive dashboard to analyze the model.
ATOM uses the explainerdashboard package to provide a quick and easy way to analyze and explain the predictions and workings of the model. The dashboard allows you to investigate SHAP values, permutation importances, interaction effects, partial dependence plots, all kinds of performance plots, and even individual decision trees.
By default, the dashboard renders in a new tab in your default
browser, but if preferable, you can render it inside the
notebook using the mode="inline"
parameter. The created
ExplainerDashboard instance can be accessed through the
dashboard
attribute. This method is not available for
multioutput tasks.
Note
Plots displayed by the dashboard are not created by ATOM and can differ from those retrieved through this package.
Parameters |
rows: hashable, segment, sequence or dataframe, default="test"
Selection of rows to get the
report from.
filename: str, Path or None, default=None
Filename or pathlib.Path of the file to save. None to not
save anything.
**kwargs
Additional keyword arguments for the ExplainerDashboard
instance.
|
Evaluate the model using cross-validation.
This method cross-validates the whole pipeline on the complete
dataset. Use it to assess the robustness of the model's
performance. If the scoring method is not specified in kwargs
,
it uses atom's metric. The results of the cross-validation are
stored in the model's cv
attribute.
Tip
This method returns a pandas' Styler object. Convert
the result back to a regular dataframe using its data
attribute.
Parameters |
include_holdout: bool, default=False
Whether to include the holdout set (if available) in the
cross-validation.
**kwargs
Additional keyword arguments for one of these functions.
|
Returns | {#cross_validate-Styler}
Styler
Overview of the results.
|
Get the model's scores for the provided metrics.
Tip
Use the get_best_threshold or plot_threshold method to determine a suitable threshold for a binary classifier.
Parameters |
metric: str, func, scorer, sequence or None, default=None
Metrics to calculate. If None, a selection of the most
common metrics per task is used.
rows: hashable, segment, sequence or dataframe, default="test"
Selection of rows to calculate
metric on.
|
Returns |
pd.Series
Scores of the model.
|
Export the transformer pipeline with final estimator.
The returned pipeline is already fitted on the training set. Note that if the model used automated feature scaling, the Scaler is added to the pipeline.
Returns | {#export_pipeline-Pipeline}
Pipeline
Current branch as a sklearn-like Pipeline object.
|
Fit and validate the model.
The estimator is fitted using the best hyperparameters found during hyperparameter tuning. Afterwards, the estimator is evaluated on the test set. Only use this method to re-fit the model after having continued the study.
Train the estimator on the complete dataset.
In some cases, it might be desirable to use all available data
to train a final model. Note that doing this means that the
estimator can no longer be evaluated on the test set. The newly
retrained estimator will replace the estimator
attribute. If
there is an active mlflow experiment, a new run is started
with the name [model_name]_full_train
. Since the estimator
changed, the model is cleared.
Warning
Although the model is trained on the complete dataset, the
pipeline is not. To get a fully trained pipeline, use:
pipeline = atom.export_pipeline().fit(atom.X, atom.y)
.
Get the threshold that maximizes a metric.
Uses sklearn's TunedThresholdClassifierCV to post-tune the
decision threshold (cut-off point) that is used for converting
posterior probability estimates (i.e., output of predict_proba
)
or decision scores (i.e., output of decision_function
) into a
class label. The tuning is done by optimizing one of atom's
metrics. The tuning estimator is stored under the tuned_threshold
attribute. Only available for binary classifiers.
Note
By default, the threshold is optimized using the training
set (which is already used for the initial training). This
approach is subject to undesired overfitting. It's preferred
to use train_on_test=True
, which uses the test set for
tuning, but only if there is another, independent set for
testing (holdout set).
Tip
Use the plot_threshold method to visualize the effect of different thresholds on a metric.
Get the model's tags.
Return class parameters that provide general information about the model's characteristics.
Returns |
dict
Model's tags.
|
Run the hyperparameter tuning algorithm.
Search for the best combination of hyperparameters. The function to optimize is evaluated either with a K-fold cross-validation on the training set or using a random train and validation split every trial. Use this method to continue the optimization.
Parameters |
n_trials: int
Number of trials for the hyperparameter tuning.
reset: bool, default=False
Whether to start a new study or continue the existing one.
|
Inversely transform new data through the pipeline.
Transformers that are only applied on the training set are
skipped. The rest should all implement an inverse_transform
method. If only X
or only y
is provided, it ignores
transformers that require the other parameter. This can be
of use to, for example, inversely transform only the target
column. If called from a model that used automated feature
scaling, the scaling is inverted as well.
Get predictions on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict
method.
Read more in the user guide.
Parameters |
fh: hashable, segment, sequence, dataframe or ForecastingHorizon
The forecasting horizon encoding
the time stamps to forecast at.
X: hashable, segment, sequence, dataframe-like or None, default=None
Exogenous time series corresponding to
inverse: bool, default=Truefh .
Whether to inversely transform the output through the
pipeline. This doesn't affect the predictions if there are
no transformers in the pipeline or if the transformers have
no
verbose: int or None, default=Noneinverse_transform method or don't apply to y .
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
series or dataframe
Predictions with shape=(n_samples,) or shape=(n_samples,
n_targets) for multivariate tasks.
|
Get prediction intervals on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict_interval
method.
Read more in the user guide.
Parameters |
fh: hashable, segment, sequence, dataframe or ForecastingHorizon
The forecasting horizon encoding
the time stamps to forecast at.
X: hashable, segment, sequence, dataframe-like or None, default=None
Exogenous time series corresponding to
coverage: float or sequence, default=0.9fh .
Nominal coverage(s) of predictive interval(s).
inverse: bool, default=True
Whether to inversely transform the output through the
pipeline. This doesn't affect the predictions if there are
no transformers in the pipeline or if the transformers have
no
verbose: int or None, default=Noneinverse_transform method or don't apply to y .
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
dataframe
Computed interval forecasts.
|
Get probabilistic forecasts on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict_proba
method.
Read more in the user guide.
Parameters |
fh: hashable, segment, sequence, dataframe or ForecastingHorizon
The forecasting horizon encoding
the time stamps to forecast at.
X: hashable, segment, sequence, dataframe-like or None, default=None
Exogenous time series corresponding to
marginal: bool, default=Truefh .
Whether returned distribution is marginal by time index.
verbose: int or None, default=None
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns | {#predict_proba-sktime.proba.Normal}
sktime.proba.Normal
Distribution object.
|
Get quantile forecasts on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict_quantiles
method.
Read more in the user guide.
Parameters |
fh: hashable, segment, sequence, dataframe or ForecastingHorizon
The forecasting horizon encoding
the time stamps to forecast at.
X: hashable, segment, sequence, dataframe-like or None, default=None
Exogenous time series corresponding to
alpha: float or sequence, default=(0.05, 0.95)fh .
A probability or list of, at which quantile forecasts are
computed.
verbose: int or None, default=None
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
dataframe
Computed quantile forecasts.
|
Get residuals of forecasts on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict_residuals
method.
Read more in the user guide.
Parameters |
y: hashable, segment, sequence or dataframe-like
Selection of rows or ground
truth observations.
X: dataframe-like or None, default=None
Exogenous time series corresponding to
verbose: int or None, default=Noney . This parameter
is ignored outif y is a selection of rows in the dataset.
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
series or dataframe
Residuals with shape=(n_samples,) or shape=(n_samples,
n_targets) for multivariate tasks.
|
Get variance forecasts on new data or existing rows.
New data is first transformed through the model's pipeline.
Transformers that are only applied on the training set are
skipped. The estimator must have a predict_var
method.
Read more in the user guide.
Parameters |
fh: hashable, segment, sequence, dataframe or ForecastingHorizon
The forecasting horizon encoding
the time stamps to forecast at.
X: hashable, segment, sequence, dataframe-like or None, default=None
Exogenous time series corresponding to
cov: bool, default=Falsefh .
Whether to compute covariance matrix forecast or marginal
variance forecasts.
verbose: int or None, default=None
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
dataframe
Computed variance forecasts.
|
Register the model in mlflow's model registry.
This method is only available when model tracking is enabled using one of the following URI schemes: databricks, http, https, postgresql, mysql, sqlite, mssql.
Reset the plot aesthetics to their default values.
Save the estimator to a pickle file.
Parameters |
filename: str or Path, default="auto"
Filename or pathlib.Path of the file to save. Use
"auto" for automatic naming.
|
Get a metric score on new data.
New data is first transformed through the model's pipeline. Transformers that are only applied on the training set are skipped.
Read more in the user guide.
Info
If the metric
parameter is left to its default value, the
method returns atom's metric score, not the metric used by
sktime's score method for estimators.
Parameters |
y: int, str, sequence or dataframe-like
Selection of rows or ground
truth observations.
X: dataframe-like or None, default=None
Exogenous time series corresponding to
fh: hashable, segment, sequence, dataframe, ForecastingHorizon or None, default=Nonefh . This parameter
is ignored if y is a selection of rows in the dataset.
Do nothing. The forecast horizon is taken from the index of
metric: str, func, scorer or None, default=Noney . Implemented for continuity of sktime's API.
Metric to calculate. Choose from any of sklearn's scorers,
a function with signature
verbose: int or None, default=Nonemetric(y_true, y_pred) -> score
or a scorer object. If None, it uses atom's metric (the main
metric for multi-metric runs).
Verbosity level for the transformers in the pipeline. If
None, it uses the pipeline's verbosity.
|
Returns |
float
Metric score of y with respect to a ground truth.
|
Serve the model as rest API endpoint for inference.
The complete pipeline is served with the model. The inference
data must be supplied as json to the HTTP request, e.g.
requests.get("http://127.0.0.1:8000/", json=X.to_json())
.
The deployment is done on a ray cluster. The default host
and port
parameters deploy to localhost.
Tip
Use import ray; ray.serve.shutdown()
to close the
endpoint after finishing.
Parameters |
method: str, default="predict"
Estimator's method to do inference on.
|
Set the binary threshold of the estimator.
A new classifier using the new threshold replaces the estimator
attribute. If there is an active mlflow experiment, a new run is
started using the name [model_name]_threshold_X
. Since the
estimator changed, the model is cleared. Only for
binary classifiers.
Tip
Use the get_best_threshold method to find the optimal threshold for a specific metric.
Parameters |
threshold: float
Binary threshold to classify the positive class.
|
Transform new data through the pipeline.
Transformers that are only applied on the training set are
skipped. If only X
or only y
is provided, it ignores
transformers that require the other parameter. This can be
of use to, for example, transform only the target column. If
called from a model that used automated feature scaling, the
data is scaled as well.
Update the properties of the plot's layout.
Recursively update the structure of the original layout with the values in the arguments.
Parameters |
**kwargs
Keyword arguments for the figure's update_layout method.
|
Update the properties of the plot's traces.
Recursively update the structure of the original traces with the values in the arguments.
Parameters |
**kwargs
Keyword arguments for the figure's update_traces method.
|