Linear-SVM (lSVM)
Similar to Kernel-SVM but with a linear kernel. Implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.
The multiclass support is handled according to a one-vs-rest scheme.
Corresponding estimators are:
Read more in sklearn's documentation.
Hyperparameters
- By default, the estimator adopts the default parameters provided by its package. See the user guide on how to customize them.
- The penaltyparameter is only used with LinearSVC.
- The penaltyparameter is always set to "l2" when loss = "hinge".
- The dualparameter is always set to False when penalty = "l1" and loss = "squared_hinge".
- The random_stateparameter is set equal to that of thetraininginstance.
| Dimensions: | loss: str 
 
C: float, default=1.0 
penalty: str, default="l2" | 
Attributes
Data attributes
| Attributes: | 
dataset: pd.DataFrame 
train: pd.DataFrame 
test: pd.DataFrame 
X: pd.DataFrame 
y: pd.Series 
X_train: pd.DataFrame 
y_train: pd.Series 
X_test: pd.DataFrame 
y_test: pd.Series 
shape: tuple 
columns: list 
n_columns: int 
features: list 
n_features: int 
target: str | 
Utility attributes
| Attributes: | bo: pd.DataFrame Dataframe containing the information of every step taken by the BO. Columns include:best_params: dict Dictionary of the best combination of hyperparameters found by the BO.estimator: class Estimator instance with the best combination of hyperparameters fitted on the complete training set.time_bo: str Time it took to run the bayesian optimization algorithm.metric_bo: float or list Best metric score(s) on the BO.time_fit: str Time it took to train the model on the complete training set and calculate the metric(s) on the test set.metric_train: float or list Metric score(s) on the training set.metric_test: float or list Metric score(s) on the test set.metric_bootstrap: list Bootstrap results with shape=(n_bootstrap,) for single-metric runs and shape=(metric, n_bootstrap) for multi-metric runs.mean_bootstrap: float or list Mean of the bootstrap results. List of values for multi-metric runs.std_bootstrap: float or list Standard deviation of the bootstrap results. List of values for multi-metric runs.results: pd.Series Series of the training results. Columns include: | 
Prediction attributes
The prediction attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to calculate attributes that are never used, saving time and memory.
| Prediction attributes: | 
predict_train: np.ndarray 
 predict_test: np.ndarray 
decision_function_train: np.ndarray 
decision_function_test: np.ndarray 
score_train: np.float64 
score_test: np.float64 | 
Methods
The majority of the plots and prediction methods
can be called directly from the models, e.g. atom.lsvm.plot_permutation_importance() or atom.lsvm.predict(X).
The remaining utility methods can be found hereunder.
| calibrate | Calibrate the model. | 
| cross_validate | Evaluate the model using cross-validation. | 
| delete | Delete the model from the trainer. | 
| export_pipeline | Export the model's pipeline to a sklearn-like Pipeline object. | 
| rename | Change the model's tag. | 
| reset_predictions | Clear all the prediction attributes. | 
| scoring | Get the score for a specific metric. | 
| save_estimator | Save the estimator to a pickle file. | 
Applies probability calibration on the estimator. The
estimator is trained via cross-validation on a subset of the
training data, using the rest to fit the calibrator. The new
classifier will replace the estimator attribute and is
logged to any active mlflow experiment. Since the estimator
changed, all the model's prediction attributes are reset.
Only if classifier.
| Parameters: | **kwargs Additional keyword arguments for sklearn's CalibratedClassifierCV. Using cv="prefit" will use the trained model and fit the calibrator on the test set. Note that doing this will result in data leakage in the test set. Use this only if you have another, independent set for testing. | 
Evaluate the model using cross-validation. This method cross-validates the whole pipeline on the complete dataset. Use it to assess the robustness of the solution's performance.
| Parameters: | **kwargs Additional keyword arguments for sklearn's cross_validate function. If the scoring method is not specified, it uses the trainer's metric. | 
| Returns: | scores: dict Return of sklearn's cross_validate function. | 
Delete the model from the trainer. If it's the winning model, the next
best model (through metric_test or mean_bootstrap) is selected as
winner. If it's the last model in the trainer, the metric and training
approach are reset. Use this method to drop unwanted models from
the pipeline or to free some memory before saving. The model is not
removed from any active mlflow experiment.
Export the model's pipeline to a sklearn-like object. If the model used feature scaling, the Scaler is added before the model. The returned pipeline is already fitted on the training set.
Note
ATOM's Pipeline class behaves exactly the same as a sklearn Pipeline, and additionally, it's compatible with transformers that drop samples and transformers that change the target column.
Warning
Due to incompatibilities with sklearn's API, the exported pipeline always fits/transforms on the entire dataset provided. Beware that this can cause errors if the transformers were fitted on a subset of the data.
| Parameters: | pipeline: bool, sequence or None, optional (default=None) Transformers to use on the data before predicting. 
 
verbose: int or None, optional (default=None) | 
| Returns: | pipeline: Pipeline Current branch as a sklearn-like Pipeline object. | 
Change the model's tag. The acronym always stays at the beginning of the model's name. If the model is being tracked by mlflow, the name of the corresponding run is also changed.
| Parameters: | name: str or None, optional (default=None) New tag for the model. If None, the tag is removed. | 
Clear the prediction attributes from all models.
Use this method to free some memory before saving the trainer.
Get the model's scoring for provided metrics.
| Parameters: | 
metric: str, func, scorer, sequence or None, optional (default=None) 
dataset: str, optional (default="test") | 
| Returns: | score: pd.Series Model's scoring. | 
Save the estimator to a pickle file.
| Parameters: | filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. | 
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.run(models="lSVM", metric="accuracy", n_calls=10)