Ordinary Least Squares (OLS)
Ordinary Least Squares is just linear regression without any regularization. It fits a linear model with coefficients w=(w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
Corresponding estimators are:
- LinearRegression for regression tasks.
Read more in sklearn's documentation.
Hyperparameters
- By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them.
- The
n_jobs
parameter is set equal to that of the trainer. - OLS has no parameters to tune with the BO.
Attributes
Data attributes
Attributes: |
dataset: pd.DataFrame
train: pd.DataFrame
test: pd.DataFrame
X: pd.DataFrame
y: pd.Series
X_train: pd.DataFrame
y_train: pd.Series
X_test: pd.DataFrame
y_test: pd.Series
shape: tuple
columns: list
n_columns: int
features: list
n_features: int
target: str |
Utility attributes
Attributes: |
estimator: class
time_fit: str
metric_train: float or list
metric_test: float or list
metric_bootstrap: np.array
mean_bootstrap: float or list
std_bootstrap: float or list Training results. Columns include:
|
Prediction attributes
The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory.
Prediction attributes: |
predict_train: np.array
predict_test: np.array
score_train: np.float64
score_test: np.float64 |
Methods
The majority of the plots and prediction methods
can be called directly from the model, e.g. atom.ols.plot_permutation_importance()
or atom.ols.predict(X)
.
The remaining utility methods can be found hereunder.
clear | Clear attributes from the model. |
cross_validate | Evaluate the model using cross-validation. |
delete | Delete the model from the trainer. |
dashboard | Create an interactive dashboard to analyze the model. |
evaluate | Get the model's scores for the provided metrics. |
export_pipeline | Export the model's pipeline to a sklearn-like Pipeline object. |
full_train | Train the estimator on the complete dataset. |
rename | Change the model's tag. |
save_estimator | Save the estimator to a pickle file. |
transform | Transform new data through the model's branch. |
Reset attributes to their initial state, deleting potentially large data arrays. Use this method to free some memory before saving the class. The cleared attributes per model are:
Evaluate the model using cross-validation. This method cross-validates the whole pipeline on the complete dataset. Use it to assess the robustness of the solution's performance.
Parameters: |
**kwargs Additional keyword arguments for sklearn's cross_validate function. If the scoring method is not specified, it uses the trainer's metric. |
Returns: |
scores: dict Return of sklearn's cross_validate function. |
Delete the model from the trainer. If it's the last model in the
trainer, the metric is reset. Use this method to drop unwanted
models from the pipeline or to free some memory before saving.
The model is not removed from any active mlflow experiment.
Create an interactive dashboard to analyze the model. The dashboard allows you to investigate SHAP values, permutation importances, interaction effects, partial dependence plots, all kinds of performance plots, and even individual decision trees. By default, the dashboard opens in an external dash app.
Parameters: |
dataset: str, optional (default="test")
filename: str or None, optional (default=None)
**kwargs |
Returns: |
dashboard: ExplainerDashboard Created dashboard object. |
Get the model's score for the provided metrics.
Parameters: |
metric: str, func, scorer, sequence or None, optional (default=None)
dataset: str, optional (default="test")
sample_weight: sequence or None, optional (default=None) |
Returns: |
score: pd.Series Scores of the model. |
Export the model's pipeline to a sklearn-like Pipeline object. If the
model used automated feature scaling,
the scaler
is added to the pipeline. The returned pipeline is already
fitted on the training set.
Info
ATOM's Pipeline class behaves the same as a sklearn Pipeline, and additionally:
- Accepts transformers that change the target column.
- Accepts transformers that drop rows.
- Accepts transformers that only are fitted on a subset of the provided dataset.
- Always outputs pandas objects.
- Uses transformers that are only applied on the training set (see the balance or prune methods) to fit the pipeline, not to make predictions on unseen data.
Parameters: |
memory: bool, str, Memory or None, optional (default=None) Used to cache the fitted transformers of the pipeline.
verbose: int or None, optional (default=None) |
Returns: |
Pipeline Current branch as a sklearn-like Pipeline object. |
In some cases it might be desirable to use all available data
to train a final model. Note that doing this means that the
estimator can no longer be evaluated on the test set. The newly
retrained estimator will replace the estimator
attribute. If
there is an active mlflow experiment, a new run is started
with the name [model_name]_full_train
. Since the estimator
changed, the model is cleared.
Parameters: |
include_holdout: bool, optional (default=False) Whether to include the holdout data set (if available) in the training of the estimator. Note that if True, it means the model can't be evaluated. |
Change the model's tag. The acronym always stays at the beginning of the model's name. If the model is being tracked by mlflow, the name of the corresponding run is also changed.
Parameters: |
name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. |
Save the estimator to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Transform new data through the model's branch. Transformers that are only applied on the training set are skipped. If the model used feature scaling, the data is also scaled.
Parameters: |
X: dataframe-like
verbose: int or None, optional (default=None) |
Returns: |
pd.DataFrame
pd.Series |
Example
from atom import ATOMRegressor
atom = ATOMRegressor(X, y)
atom.run(models="OLS")