Release history

Version 5.1.2

API changes

The default strategy for the encode method has changed from "LeaveOneOut" to "Target"-encoding. LeaveOneOut is no longer a supported strategy.

Bug fixes

Fixed a bug where stratification failed for datasets where the target column was not placed last.
Fixed a bug where transformers with no get_feature_names_out method could fail.
Fixed a bug where the FeatureSelector class could fail when transforming a dataset with different column order than seen at fit time.

API changes

The infrequent_to_value parameter in the Encoder class is replaced with infrequent_to_value to be consistent with sklearn's naming convention.

Enhancements

Bug fixes

Fixed an installation issue for systems without an x86 architecture.
Fixed a bug where Voting would fail for certain metrics.
Fixed a bug where the automl method would fail for some transformers.
Fixed a bug where the time metric in mlflow was always zero.
Fixed a bug where shap plots wouldn't display the full column names.
Fixed a bug where column names where not properly propagated during transformation.

New features

Support for multilabel classification, multiclass-multilabel classification and multioutput regression tasks. Read more in the user guide.
New backend parameter to choose a parallel execution backend.
New parallel parameter to train multiple models simultaneously.
Integration with DAGsHub to store your mlflow experiments. Read more in the user guide.
New serve method to deploy models to a rest API endpoint.
New get_best_threshold method to calculate the optimal threshold for binary and multilabel tasks.
New get_sample_weight method to calculate the sample weights for a balanced data set.

API changes

Enhancements

Added three new notebook examples.
Added the drop_chars parameter to the Cleaner class.
Added the errors parameter to the trainers.
Rework of the dependencies, making the base package more lightweight.
The logging entries for external libraries are redirected to atom's file handler.

Bug fixes

Fixed multiple errors that appeared after sklearn's 1.2 update.
Fixed a bug where hyperparameter tuning could fail for multi-metric runs.
Fixed a bug where trials would try to report multiple times the same step.
Fixed a bug where custom models could skip in-training validation.
Fixed an issue where the bootstrapping estimators were trained using partial_fit.

Bug fixes

New features

API changes

The gpu parameter is deprecated in favor of device and engine.
Refactor of the Cleaner, Discretizer, Encoder and FeatureSelector classes.
Refactor of all shap plots.
Refactor of the apply method.
The plot_scatter_matrix method is renamed to plot_relationships.
The kSVM model is renamed to SVM.
Multidimensional datasets are no longer supported. Check the deep learning section of the user guide for guidance with such datasets.
The greater_is_better, needs_proba and needs_threshold parameters are deprecated. Metric functions are now created using make_scorer's default parameters.
The drop method is removed from atom. Use the reworked apply method instead.
The prediction methods can no longer be called from atom.
The dashboard method for models is now called create_dashboard.

Enhancements

New examples for plotting, automated feature scaling, pruning and advanced hyperparameter tuning.
The Normalizer class can now be accelerated with GPU.
The Scaler class now ignores binary columns (only 0s and 1s).
The models parameter in plot and utility methods now accepts model indices.
The transform method now also transforms only y when X has a default value.
The prediction methods now return pandas objects.
Dependency versions are checked with originals after unpickling.
Automatic generation of documentation from docstrings.
Improvements in documentation display for mobile phones.
New feature_importance attribute for models.
Added a visualization for automated feature scaling to plot_pipeline.

Bug fixes

The FeatureExtractor class no longer raises a warning for highly fragmented dataframes.
Fixed a bug where models could not call the score function.
The Encoder class no longer fails when the user provides ordinal values that are not present during fitting.
Fixed a bug with the max_nan_rows parameter in the Imputer class.
Fixed a bug where Tokenizer could fail when no ngrams were found.