LightGBM (LGB)
LightGBM is a gradient boosting model that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
- Faster training speed and higher efficiency.
- Lower memory usage.
- Better accuracy.
- Capable of handling large-scale data.
Corresponding estimators are:
- LGBMClassifier for classification tasks.
- LGBMRegressor for regression tasks.
Read more in LightGBM's documentation.
Info
LightGBM allows early stopping to stop the training of unpromising models prematurely!
Hyperparameters
- By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them.
- The
n_jobs
andrandom_state
parameters are set equal to those of the trainer.
Dimensions: |
n_estimators: int, default=100
learning_rate: float, default=0.1
max_depth: int, default=-1
num_leaves: int, default=31
min_child_weight: int, default=1
min_child_samples: int, default=20
subsample: float, default=1.0
colsample_by_level: float, default=1.0
reg_alpha: float, default=0.0
reg_lambda: float, default=0.0 |
Attributes
Data attributes
Attributes: |
dataset: pd.DataFrame
train: pd.DataFrame
test: pd.DataFrame
X: pd.DataFrame
y: pd.Series
X_train: pd.DataFrame
y_train: pd.Series
X_test: pd.DataFrame
y_test: pd.Series
shape: tuple
columns: list
n_columns: int
features: list
n_features: int
target: str |
Utility attributes
Attributes: |
bo: pd.DataFrame Information of every step taken by the BO. Columns include:
best_params: dict
estimator: class
time_bo: str
metric_bo: float or list
time_fit: str
metric_train: float or list
metric_test: float or list
metric_bootstrap: np.ndarray
mean_bootstrap: float or list
std_bootstrap: float or list Training results. Columns include:
|
Prediction attributes
The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory.
Prediction attributes: |
predict_train: np.ndarray
predict_test: np.ndarray
predict_proba_train: np.ndarray
predict_proba_test: np.ndarray
predict_log_proba_train: np.ndarray
predict_log_proba_test: np.ndarray
score_train: np.float64
score_test: np.float64 |
Methods
The majority of the plots and prediction methods
can be called directly from the models, e.g. atom.lgb.plot_permutation_importance()
or atom.lgb.predict(X)
.
The remaining utility methods can be found hereunder.
calibrate | Calibrate the model. |
cross_validate | Evaluate the model using cross-validation. |
delete | Delete the model from the trainer. |
export_pipeline | Export the model's pipeline to a sklearn-like Pipeline object. |
full_train | Get the estimator trained on the complete dataset. |
rename | Change the model's tag. |
reset_predictions | Clear all the prediction attributes. |
evaluate | Get the score for a specific metric. |
save_estimator | Save the estimator to a pickle file. |
Applies probability calibration on the estimator. The
estimator is trained via cross-validation on a subset of the
training data, using the rest to fit the calibrator. The new
classifier will replace the estimator
attribute and is
logged to any active mlflow experiment. Since the estimator
changed, all the model's prediction attributes are reset.
Only if classifier.
Parameters: |
**kwargs Additional keyword arguments for sklearn's CalibratedClassifierCV. Using cv="prefit" will use the trained model and fit the calibrator on the test set. Use this only if you have another, independent set for testing. |
Evaluate the model using cross-validation. This method cross-validates the whole pipeline on the complete dataset. Use it to assess the robustness of the solution's performance.
Parameters: |
**kwargs Additional keyword arguments for sklearn's cross_validate function. If the scoring method is not specified, it uses the trainer's metric. |
Returns: |
scores: dict Return of sklearn's cross_validate function. |
Delete the model from the trainer. If it's the winning model, the next
best model (through metric_test
or mean_bootstrap
) is selected as
winner. If it's the last model in the trainer, the metric and training
approach are reset. Use this method to drop unwanted models from
the pipeline or to free some memory before saving. The model is not
removed from any active mlflow experiment.
Export the model's pipeline to a sklearn-like object. If the model used feature scaling, the Scaler is added before the model. The returned pipeline is already fitted on the training set.
Info
ATOM's Pipeline class behaves the same as a sklearn Pipeline, and additionally:
- Accepts transformers that change the target column.
- Accepts transformers that drop rows.
- Accepts transformers that only are fitted on a subset of the provided dataset.
- Always outputs pandas objects.
- Uses transformers that are only applied on the training set (see the balance or prune methods) to fit the pipeline, not to make predictions on unseen data.
Parameters: |
verbose: int or None, optional (default=None) |
Returns: |
pipeline: Pipeline Current branch as a sklearn-like Pipeline object. |
Get the estimator trained on the complete dataset. In some cases it might be desirable to use all the available data to train a final model after the right hyperparameters are found. Note that this means that the model can not be evaluated.
Returns: |
est: estimator Model estimator trained on the full dataset. |
Change the model's tag. The acronym always stays at the beginning of the model's name. If the model is being tracked by mlflow, the name of the corresponding run is also changed.
Parameters: |
name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. |
Clear the prediction attributes from all models.
Use this method to free some memory before saving the trainer.
Get the model's score for the provided metrics.
Parameters: |
metric: str, func, scorer, sequence or None, optional (default=None)
dataset: str, optional (default="test") |
Returns: |
score: pd.Series Scores of the model. |
Save the estimator to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Example
from atom import ATOMRegressor
atom = ATOMRegressor(X, y)
atom.run(models="LGB", n_calls=50, bo_params={"base_estimator": "ET"})