Skip to content

Ordinary Least Squares (OLS)


Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w=(w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Corresponding estimators are:

Read more in sklearn's documentation.



Hyperparameters

  • By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them.
  • The n_jobs parameter is set equal to that of the trainer.
  • OLS has no parameters to tune with the BO.



Attributes


Data attributes

Attributes:

dataset: pd.DataFrame
Complete dataset in the pipeline.

train: pd.DataFrame
Training set.

test: pd.DataFrame
Test set.

X: pd.DataFrame
Feature set.

y: pd.Series
Target column.

X_train: pd.DataFrame
Training features.

y_train: pd.Series
Training target.

X_test: pd.DataFrame
Test features.

y_test: pd.Series
Test target.

shape: tuple
Dataset's shape: (n_rows x n_columns) or (n_rows, (shape_sample), n_cols) for datasets with more than two dimensions.

columns: list
Names of the columns in the dataset.

n_columns: int
Number of columns in the dataset.

features: list
Names of the features in the dataset.

n_features: int
Number of features in the dataset.

target: str
Name of the target column.


Utility attributes

Attributes:

estimator: class
Estimator instance with the best combination of hyperparameters fitted on the complete training set.

time_fit: str
Time it took to train the model on the complete training set and calculate the metric(s) on the test set.

metric_train: float or list
Metric score(s) on the training set.

metric_test: float or list
Metric score(s) on the test set.

metric_bootstrap: np.ndarray
Bootstrap results with shape=(n_bootstrap,) for single-metric runs and shape=(metric, n_bootstrap) for multi-metric runs.

mean_bootstrap: float or list
Mean of the bootstrap results. List of values for multi-metric runs.

std_bootstrap: float or list
Standard deviation of the bootstrap results. List of values for multi-metric runs.

results: pd.Series
Training results. Columns include:
  • metric_bo: Best score achieved during the BO.
  • time_bo: Time spent on the BO.
  • metric_train: Metric score on the training set.
  • metric_test: Metric score on the test set.
  • time_fit: Time spent fitting and evaluating.
  • mean_bootstrap: Mean score of the bootstrap results.
  • std_bootstrap: Standard deviation score of the bootstrap results.
  • time_bootstrap: Time spent on the bootstrap algorithm.
  • time: Total time spent on the whole run.


Prediction attributes

The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory.

Prediction attributes:

predict_train: np.ndarray
Predictions of the model on the training set.

predict_test: np.ndarray
Predictions of the model on the test set.

score_train: np.float64
Model's score on the training set.

score_test: np.float64
Model's score on the test set.



Methods

The majority of the plots and prediction methods can be called directly from the model, e.g. atom.ols.plot_permutation_importance() or atom.ols.predict(X). The remaining utility methods can be found hereunder.

cross_validate Evaluate the model using cross-validation.
delete Delete the model from the trainer.
export_pipeline Export the model's pipeline to a sklearn-like Pipeline object.
full_train Get the estimator trained on the complete dataset.
rename Change the model's tag.
reset_predictions Clear all the prediction attributes.
evaluate Get the score for a specific metric.
save_estimator Save the estimator to a pickle file.


method cross_validate(**kwargs) [source]

Evaluate the model using cross-validation. This method cross-validates the whole pipeline on the complete dataset. Use it to assess the robustness of the solution's performance.

Parameters: **kwargs
Additional keyword arguments for sklearn's cross_validate function. If the scoring method is not specified, it uses the trainer's metric.
Returns: scores: dict
Return of sklearn's cross_validate function.


method delete() [source]

Delete the model from the trainer. If it's the winning model, the next best model (through metric_test or mean_bootstrap) is selected as winner. If it's the last model in the trainer, the metric and training approach are reset. Use this method to drop unwanted models from the pipeline or to free some memory before saving. The model is not removed from any active mlflow experiment.


method export_pipeline(verbose=None) [source]

Export the model's pipeline to a sklearn-like object. If the model used feature scaling, the Scaler is added before the model. The returned pipeline is already fitted on the training set.

Info

ATOM's Pipeline class behaves the same as a sklearn Pipeline, and additionally:

  • Accepts transformers that change the target column.
  • Accepts transformers that drop rows.
  • Accepts transformers that only are fitted on a subset of the provided dataset.
  • Always outputs pandas objects.
  • Uses transformers that are only applied on the training set (see the balance or prune methods) to fit the pipeline, not to make predictions on unseen data.

Parameters:

verbose: int or None, optional (default=None)
Verbosity level of the transformers in the pipeline. If None, it leaves them to their original verbosity.

Returns: pipeline: Pipeline
Current branch as a sklearn-like Pipeline object.


method full_train() [source]

Get the estimator trained on the complete dataset. In some cases it might be desirable to use all the available data to train a final model after the right hyperparameters are found. Note that this means that the model can not be evaluated.

Returns: est: estimator
Model estimator trained on the full dataset.


method rename(name=None) [source]

Change the model's tag. The acronym always stays at the beginning of the model's name. If the model is being tracked by mlflow, the name of the corresponding run is also changed.

Parameters: name: str or None, optional (default=None)
New tag for the model. If None, the tag is removed.


method reset_predictions() [source]

Clear the prediction attributes from all models. Use this method to free some memory before saving the trainer.


method evaluate(metric=None, dataset="test") [source]

Get the model's score for the provided metrics.

Parameters:

metric: str, func, scorer, sequence or None, optional (default=None)
Metrics to calculate. If None, a selection of the most common metrics per task are used.

dataset: str, optional (default="test")
Data set on which to calculate the metric. Options are "train" or "test".

Returns: score: pd.Series
Scores of the model.


method save_estimator(filename="auto") [source]

Save the estimator to a pickle file.

Parameters: filename: str, optional (default="auto")
Name of the file. Use "auto" for automatic naming.


Example

from atom import ATOMRegressor

atom = ATOMRegressor(X, y)
atom.run(models="OLS")
Back to top