Logistic regression (LR)
Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.
Corresponding estimators are:
- LogisticRegression for classification tasks.
 
Read more in sklearn's documentation.
Hyperparameters
- By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them.
 - The 
penaltyparameter is always set to "l2" when penalty="none" and solver="liblinear". - The 
penaltyparameter is always set to "l2" when penalty="l1" and solver!="liblinear" or "saga". - The 
penaltyparameter is always set to "l2" when penalty="elasticnet" and solver!="saga". - The 
Cparameter is not used when penalty="none". - The 
l1_ratioparameter is only used when penalty="elasticnet". - The 
n_jobsandrandom_stateparameters are set equal to those of the trainer. 
| Dimensions: | 
 
penalty: str, default="l2" 
C: float, default=1.0 
solver: str, default="lbfgs" 
max_iter: int, default=100 
l1_ratio: float, default=None  | 
Attributes
Data attributes
| Attributes: | 
 
dataset: pd.DataFrame 
train: pd.DataFrame 
test: pd.DataFrame 
X: pd.DataFrame 
y: pd.Series 
X_train: pd.DataFrame 
y_train: pd.Series 
X_test: pd.DataFrame 
y_test: pd.Series 
shape: tuple 
columns: list 
n_columns: int 
features: list 
n_features: int 
target: str  | 
Utility attributes
| Attributes: | 
bo: pd.DataFrame Information of every step taken by the BO. Columns include: 
 
best_call: str 
best_params: dict 
estimator: class 
time_bo: str 
metric_bo: float or list 
time_fit: str 
metric_train: float or list 
metric_test: float or list 
metric_bootstrap: np.array 
mean_bootstrap: float or list 
std_bootstrap: float or list Training results. Columns include: 
  | 
Prediction attributes
The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory.
| Prediction attributes: | 
 
predict_train: np.array 
 predict_test: np.array 
predict_proba_train: np.array 
predict_proba_test: np.array 
predict_log_proba_train: np.array 
predict_log_proba_test: np.array 
decision_function_train: np.array 
decision_function_test: np.array 
score_train: np.float64 
score_test: np.float64  | 
Methods
The majority of the plots and prediction methods
can be called directly from the model, e.g. atom.lr.plot_permutation_importance()
or atom.lr.predict(X). The remaining utility methods can be found hereunder.
| calibrate | Calibrate the model. | 
| clear | Clear attributes from the model. | 
| cross_validate | Evaluate the model using cross-validation. | 
| delete | Delete the model from the trainer. | 
| dashboard | Create an interactive dashboard to analyze the model. | 
| evaluate | Get the model's scores for the provided metrics. | 
| export_pipeline | Export the model's pipeline to a sklearn-like Pipeline object. | 
| full_train | Train the estimator on the complete dataset. | 
| rename | Change the model's tag. | 
| save_estimator | Save the estimator to a pickle file. | 
| transform | Transform new data through the model's branch. | 
Applies probability calibration on the estimator. The
estimator is trained via cross-validation on a subset of the
training data, using the rest to fit the calibrator. The new
classifier will replace the estimator attribute and is
logged to any active mlflow experiment. Since the estimator
changed, all the model's prediction attributes are reset.
| Parameters: | 
**kwargs Additional keyword arguments for sklearn's CalibratedClassifierCV. Using cv="prefit" will use the trained model and fit the calibrator on the test set. Use this only if you have another, independent set for testing.  | 
Reset attributes to their initial state, deleting potentially large data arrays. Use this method to free some memory before saving the class. The cleared attributes per model are:
Evaluate the model using cross-validation. This method cross-validates the whole pipeline on the complete dataset. Use it to assess the robustness of the solution's performance.
| Parameters: | 
**kwargs Additional keyword arguments for sklearn's cross_validate function. If the scoring method is not specified, it uses the trainer's metric.  | 
| Returns: | 
scores: dict Return of sklearn's cross_validate function.  | 
Delete the model from the trainer. If it's the last model in the
trainer, the metric is reset. Use this method to drop unwanted
models from the pipeline or to free some memory before saving.
The model is not removed from any active mlflow experiment.
Create an interactive dashboard to analyze the model. The dashboard allows you to investigate SHAP values, permutation importances, interaction effects, partial dependence plots, all kinds of performance plots, and even individual decision trees. By default, the dashboard opens in an external dash app.
| Parameters: | 
 
dataset: str, optional (default="test") 
filename: str or None, optional (default=None) 
**kwargs  | 
| Returns: | 
dashboard: ExplainerDashboard Created dashboard object.  | 
Get the model's scores for the provided metrics.
| Parameters: | 
 
metric: str, func, scorer, sequence or None, optional (default=None) 
dataset: str, optional (default="test") Threshold between 0 and 1 to convert predicted probabilities to class labels. Only used when: 
  | 
| Returns: | 
score: pd.Series Scores of the model.  | 
Export the model's pipeline to a sklearn-like Pipeline object. If the
model used automated feature scaling,
the scaler is added to the pipeline. The returned pipeline is already
fitted on the training set.
Info
ATOM's Pipeline class behaves the same as a sklearn Pipeline, and additionally:
- Accepts transformers that change the target column.
 - Accepts transformers that drop rows.
 - Accepts transformers that only are fitted on a subset of the provided dataset.
 - Always outputs pandas objects.
 - Uses transformers that are only applied on the training set (see the balance or prune methods) to fit the pipeline, not to make predictions on unseen data.
 
| Parameters: | 
memory: bool, str, Memory or None, optional (default=None) Used to cache the fitted transformers of the pipeline. 
 
verbose: int or None, optional (default=None)  | 
| Returns: | 
Pipeline Current branch as a sklearn-like Pipeline object.  | 
In some cases it might be desirable to use all available data
to train a final model. Note that doing this means that the
estimator can no longer be evaluated on the test set. The newly
retrained estimator will replace the estimator attribute. If
there is an active mlflow experiment, a new run is started
with the name [model_name]_full_train. Since the estimator
changed, the model is cleared.
| Parameters: | 
include_holdout: bool, optional (default=False) Whether to include the holdout data set (if available) in the training of the estimator. Note that if True, it means the model can't be evaluated.  | 
Change the model's tag. The acronym always stays at the beginning of the model's name. If the model is being tracked by mlflow, the name of the corresponding run is also changed.
| Parameters: | 
name: str or None, optional (default=None) New tag for the model. If None, the tag is removed.  | 
Get the model's score for the provided metrics.
| Parameters: | 
metric: str or None, optional (default=None) Name of the metric to calculate. If None, returns the models' final results (ignoring the dataset parameter). Choose from any
of sklearn's classification SCORERS
or one of the following custom metrics:
 
dataset: str, optional (default="test")  | 
| Returns: | 
score: float or np.array Model's score for the selected metric.  | 
Save the estimator to a pickle file.
| Parameters: | 
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming.  | 
Transform new data through the model's branch. Transformers that are only applied on the training set are skipped. If the model used feature scaling, the data is also scaled.
| Parameters: | 
 
X: dataframe-like 
 
verbose: int or None, optional (default=None)  | 
| Returns: | 
 
pd.DataFrame 
pd.Series  | 
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.run(models="LR")