Release history

Version 4.14.1

Fixed an installation issue with conda.

Version 4.14.0

Refactor of the Cleaner and Vectorizer classes.
Refactor of the cross_validate method.
The plot_pipeline method now supports drawing multiple pipelines.
Renamed the Normalizer class to TextNormalizer.
Renamed the Gauss class to Normalizer.
Added the inverse_transform method to the Scaler, Normalizer and Cleaner classes.
Added the winners property to the trainers (note the extra s).
Added the feature_names_in_ and n_features_in_ attributes to transformers.
The default value of the warnings parameter is set to False.
Improvements for multicollinearity removal in FeatureSelector.
Renamed default feature names to x0, x1, etc... for consistency with sklearn's API.
Renamed component names in FeatureSelector to pca0, pca1, etc... for consistency with sklearn's API.
Significant speed up in pipeline transformations.
Fixed a bug where mlflow runs could be ended unexpectedly.

Version 4.13.1

Fixed an installation issue.

Version 4.13.0

Added GPU support. Read more in the user guide.
Added advanced feature selection strategies.
Added the return_sparse parameter to the Vectorizer class.
Added the quantile hyperparameter to the Dummy model.
The data attributes now return pandas objects where possible.
Fixed a bug where the BO could crash after balancing the data.
Fixed a bug where saving the FeatureGenerator class could fail for certain operators.
Fixed a bug where the FeatureSelector class displayed the wrong output.
Fixed a bug where the mapping attribute was not reordered.

Version 4.12.0

Support for Python 3.10.
New Discretizer class to bin numerical features.
Refactor of the FeatureGenerator class.
The mapping attribute now shows all encoded features.
Added the sample_weight parameter to the evaluate method.
ATOMClassifier has now a stratify parameter to split the data sets in a stratified fashion.
Possibility to exclude hyperparameters from the BO adding ! before the name.
Added memory usage to the stats method.
Fixed a bug where decision_plot could fail when only one row was plotted.
Added versioning to the documentation.

Version 4.11.0

Full support for sparse matrices. Read more in the user guide.
The shrink method now also handles sparse features.
Refactor of the distribution method.
Added three new linear models: Lars, Huber and Perc.
Dimensions can be shared across models using the key 'all' in ht_params["dimensions"].
Assign hyperparameters to tune using the predefined dimensions.
It's now possible to tune a custom number of layers for the MLP model.
If multiple BO calls share the best score, the one with the shortest training time is selected as winner (instead of the first).
Fixed a bug where the BO could fail when custom dimensions where defined.
Fixed a bug where FeatureSelector could fail after repeated calls to fit.
Fixed a bug where FeatureGenerator didn't pass the correct data indices to its output.
Performance improvements for the custom pipeline.
Minor documentation fixes.

Version 4.10.0

Added the holdout data set to have an extra way of assessing a model's performance on a completely independent dataset. Read more in the user_guide.
Complete rework of the ensemble models.
Support for dataframe indexing. Read more in the user guide.
New plot_parshap plot to detect overfitting features.
The new dashboard method makes analyzing the models even easier using a dashboard app.
The plot_feature_importance plot now also accepts estimators with coefficients.
Added the transform method for models.
Added the threshold parameter to the evaluate method.
The reset_predictions method is deprecated in favour of the new clear method.
Refactor of the model's full_train method.
The merge method is available for all trainers.
Improvements in the trainer's pipeline.
Training scores are now also saved to the mlflow run.
Trying to change the data in a branch after fitting a model with it now raises an exception.
Fixed a bug where the columns of array inputs were not ordered correctly.
Fixed a bug where branches did not correctly act case-insensitive.
Fixed a bug where the export_pipeline method for models would not export the transformers in the correct branch.

Version 4.9.1

Changed the default cross-validation for hyperparameter tuning from 5 to 1 to avoid errors with deep learning models.
Added clearer exception messages when a model's run failed.
Fixed a bug where custom dimensions didn't show during hyperparameter tuning.
Documentation improvements.

Version 4.9.0

Drop support of Python 3.6.
Added the HistGBM model.
Improved print layout for hyperparameter tuning.
The new available_models method returns an overview of the available predefined models.
The calibrate and cross_validate methods can no longer be accessed from the trainers.
The pipeline parameter for the prediction methods is deprecated.
Improved visualization of the plot_rfecv, plot_successive_halving and plot_learning_curve methods.
Sparse matrices are now accepted as input.
Duplicate BO calls are no longer calculated.
Improvement in performance of the RNN model.
Refactor of the model's bo attribute.
Predefined hyperparameters have been updated to be consistent with sklearn's API.
Fixed a bug where custom scalers were ignored by the models.
Fixed a bug where the BO of certain models would crash with custom hyperparameters.
Fixed a bug where duplicate column names could be generated from a custom transformer.
Documentation improvements.

Version 4.8.0

The Encoder class now directly handles unknown categories encountered during fitting.
The Balancer and Encoder classes now accept custom estimators for the strategy parameter.
The new merge method enables the user to merge multiple atom instances into one.
The dtype shrinking is moved from atom's initializers to the shrink method.
ATOM's custom pipeline now handles transformers fitted on a subset of the dataset.
The column parameter in the distribution method is renamed to columns for continuity of the API.
The mae criterion for the GBM model hyperparameter tuning is deprecated to be consistent with sklearn's API.
Branches are now case-insensitive.
Renaming a branch using an existing name now raises an exception.
Fixed a bug where columns of type category broke the Imputer class.
Fixed a bug where predictions of the Stacking ensemble crashed for branches with multiple transformers.
The tables in the documentation now adapt to dark mode.

Version 4.7.3

Fixed a bug where the conda-forge recipe couldn't install properly.

Version 4.7.2

Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
Package requirements files are added to the installer.

Version 4.7.1

Fixed a bug where the pip installer failed.
Fixed a bug where categorical columns also selected datetime columns.

Version 4.7.0

Launched our new slack channel!
The new FeatureExtractor class extracts useful features from datetime columns.
The new plot_det method plots a binary classifier's detection error tradeoff curve.
The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
The full traceback of exceptions encountered during training are now saved to the logger.
ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
The scoring method is renamed to evaluate to clarify its purpose.
The column parameter in the apply method is renamed to columns for continuity of the API.
Minor documentation improvements.

Version 4.6.0

Added the full_train method to retrieve an estimator trained on the complete dataset.
The score method is now also able to calculate custom metrics on new data.
Refactor of the Imputer class.
Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
The clean method no longer automatically encodes the target column for regression tasks.
Creating a branch using a models' acronym as name now raises an exception.
Fixed a bug where CatBoost failed when early_stopping < 1.
Fixed a bug where created pipelines had duplicated names.

Version 4.5.0

Support of NLP pipelines. Read more in the user guide.
Integration of mlflow to track all models in the pipeline. Read more in the user guide.
The new Normalizer class transforms features to a more Gaussian-like distribution.
New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
New reset method to go back to atom's initial state.
Added the Dummy model to compare other models with a simple baseline.
New plot_wordcloud and plot_ngrams methods for text visualization.
Plots now can return the figure object when display=None.
The Pruner class can now able to drop outliers based on the selection of multiple strategies.
The new shuffle parameter in atom's initializer determines whether to shuffle the dataset.
The trainers no longer require you to specify a model using the models parameter. If left to default, all predefined models for that task are used.
The apply method now accepts args and kwargs for the function.
Refactor of the evaluate method.
Refactor of the export_pipeline method.
The parameters in the Cleaner class have been refactored to better describe their function.
The train_sizes parameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set.
Refactor of plot_pipeline to show models in the diagram as well.
Refactor of the bagging parameter to the (more appropriate) name n_bootstrap.
New option to exclude columns from a transformer adding ! before their name.
Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
Completely reworked documentation website.

Version 4.4.0

The drop method now allows the user to drop columns as part of the pipeline.
New apply method to perform data transformations as function to the pipeline
Added the status method to save an overview of atom's branches and models to the logger.
Improved the output messages for the Imputer class.
The dataset's columns can now be called directly from atom.
The distribution and plot_distribution methods now ignore missing values.
Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
Fixed a bug where the Cleaner class didn't drop columns consisting entirely of missing values when drop_min_cardinality=True.
Fixed a bug where the winning model wasn't displayed correctly.
Refactored the way transformers are added or removed from predicting methods.
Improved documentation.

Version 4.3.0

Possibility to add custom transformers to the pipeline.
The export_pipeline utility method exports atom's current pipeline to a sklearn object.
Use AutoML to automate the search for an optimized pipeline.
New magic methods makes atom behave similarly to sklearn's Pipeline.
All training approaches can now be combined in the same atom instance.
New plot_relationships, plot_distribution and plot_qq plots for data inspection.
Complete rework of all the shap plots to be consistent with their new API.
Improvements for the Scaler and Pruner classes.
The acronym for custom models now defaults to the capital letters in the class' __name__.
Possibility to apply transformations on only a subset of the columns.
Plots and methods now accept winner as model name.
Fixed a bug where custom metrics didn't show the correct name.
Fixed a bug where timers were not displayed correctly.
Further compatibility with deep learning datasets.
Large refactoring for performance optimization.
Cleaner output of messages to the logger.
Plots no longer show a default title.
Added the AutoML example notebook.
Minor bug fixes.

Version 4.2.1

Bug fix where there was memory leakage in successive halving and train sizing pipelines.
The XGBoost, LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models, e.g. pip install -U atom-ml[models].
Improved documentation.

Version 4.2.0

Possibility to add custom models to the pipeline using ATOMModel.
Compatibility with deep learning models.
New branch system for different data pipelines. Read more in the user guide.
Use the canvas contextmanager to draw multiple plots in one figure.
New voting and stacking ensemble techniques.
New get_class_weight utility method.
New Sequential Feature Selection strategy for the FeatureSelector.
Added the sample_weight parameter to the score method.
New ways to initialize the data in the training instances.
The n_rows parameter in ATOMLoader is deprecated in favour of the new input formats.
The test_size parameter now also allows integer values.
Renamed categories to classes to be consistent with sklearn's API.
The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
Possibility to add custom parameters to an estimator's fit method through est_params.
The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
Bug fix where ATOMLoader wouldn't encode the target column during transformation.
Added the Deep learning, Ensembles and Utilities example notebooks.
Support for python 3.9.

Version 4.1.0

New est_params parameter to customize the parameters in every model's estimator.
Following skopt's API, the n_random_starts parameter to specify the number of random trials is deprecated in favour of n_initial_points.
The Balancer class now allows you to use any of the strategies from imblearn.
New utility attributes to inspect the dataset.
Four new models: CatNB, CNB, ARD and RNN.
Added the models section to the documentation.
Small changes in log outputs.
Bug fixes and performance improvements.

Version 4.0.1

Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
Bug fix where subsequent runs with the same metric failed.
Added the license file to the package's installer.
Typo fixes in documentation.

Version 4.0.0

Bayesian optimization package changed from GpyOpt to skopt.
Complete revision of the model's hyperparameters.
Four SHAP plots can now be called directly from an ATOM pipeline.
Two new plots for regression tasks.
New plot_pipeline and pipeline attribute to access all transformers.
Possibility to determine transformer parameters per method.
New calibration method and plot.
Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
Implementation of multi-metric runs.
Possibility to choose which metric to plot.
Early stopping for models that allow in-training validation.
Added the ATOMLoader function to load any saved pickle instance.
The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
Implemented the dfs strategy in FeatureGenerator.
All training classes now inherit from BaseEstimator.
Added multiple new example notebooks.
Tests coverage up to 100%.
Completely new documentation page.
Bug fixes and performance improvements.