Release history

Version 4.7.3

Fixed a bug where the conda-forge recipe couldn't install properly.

Version 4.7.2

Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
Package requirements files are added to the installer.

Version 4.7.1

Fixed a bug where the pip installer failed.
Fixed a bug where categorical columns also selected datetime columns.

Version 4.7.0

Launched our new slack channel!
The new FeatureExtractor class extracts useful features from datetime columns.
The new plot_det method plots a binary classifier's detection error tradeoff curve.
The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
The full traceback of exceptions encountered during training are now saved to the logger.
ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
The scoring method is renamed to evaluate to clarify its purpose.
The column parameter in the apply method is renamed to columns for continuity of the API.
Minor documentation improvements.

Version 4.6.0

Added the full_train method to retrieve an estimator trained on the complete dataset.
The score method is now also able to calculate custom metrics on new data.
Refactor of the Imputer class.
Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
The clean method no longer automatically encodes the target column for regression tasks.
Creating a branch using a models' acronym as name now raises an exception.
Fixed a bug where CatBoost failed when early_stopping < 1.
Fixed a bug where created pipelines had duplicated names.

Version 4.5.0

Support of NLP pipelines. Read more in the user guide.
Integration of mlflow to track all models in the pipeline. Read more in the user guide.
The new Gauss class transforms features to a more Gaussian-like distribution.
New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
New reset method to go back to atom's initial state.
Added the Dummy model to compare other models with a simple baseline.
New plot_wordcloud and plot_ngrams methods for text visualization.
Plots now can return the figure object when display=None.
The Pruner class can now able to drop outliers based on the selection of multiple strategies.
The new shuffle parameter in atom's initializer determines whether to shuffle the dataset.
The trainers no longer require you to specify a model using the models parameter. If left to default, all predefined models for that task are used.
The apply method now accepts args and kwargs for the function.
Refactor of the evaluate method.
Refactor of the export_pipeline method.
The parameters in the Cleaner class have been refactored to better describe their function.
The train_sizes parameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set.
Refactor of plot_pipeline to show models in the diagram as well.
Refactor of the bagging parameter to the (more appropriate) name n_bootstrap.
New option to exclude columns from a transformer adding ! before their name.
Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
Completely reworked documentation website.

Version 4.4.0

The drop method now allows the user to drop columns as part of the pipeline.
New apply method to perform data transformations as function to the pipeline
Added the status method to save an overview of atom's branches and models to the logger.
Improved the output messages for the Imputer class.
The dataset's columns can now be called directly from atom.
The distribution and plot_distribution methods now ignore missing values.
Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
Fixed a bug where the Cleaner class didn't drop columns consisting entirely of missing values when drop_min_cardinality=True.
Fixed a bug where the winning model wasn't displayed correctly.
Refactored the way transformers are added or removed from predicting methods.
Improved documentation.

Version 4.3.0

Possibility to add custom transformers to the pipeline.
The export_pipeline utility method exports atom's current pipeline to a sklearn object.
Use AutoML to automate the search for an optimized pipeline.
New magic methods makes atom behave similarly to sklearn's Pipeline.
All training approaches can now be combined in the same atom instance.
New plot_scatter_matrix, plot_distribution and plot_qq plots for data inspection.
Complete rework of all the shap plots to be consistent with their new API.
Improvements for the Scaler and Pruner classes.
The acronym for custom models now defaults to the capital letters in the class' __name__.
Possibility to apply transformations on only a subset of the columns.
Plots and methods now accept winner as model name.
Fixed a bug where custom metrics didn't show the correct name.
Fixed a bug where timers were not displayed correctly.
Further compatibility with deep learning datasets.
Large refactoring for performance optimization.
Cleaner output of messages to the logger.
Plots no longer show a default title.
Added the AutoML example notebook.
Minor bug fixes.

Version 4.2.1

Bug fix where there was memory leakage in successive halving and train sizing pipelines.
The XGBoost, LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models, e.g. pip install -U atom-ml[models].
Improved documentation.

Version 4.2.0

Possibility to add custom models to the pipeline using ATOMModel.
Compatibility with deep learning models.
New branch system for different data pipelines. Read more in the user guide.
Use the canvas contextmanager to draw multiple plots in one figure.
New voting and stacking ensemble techniques.
New get_class_weight utility method.
New Sequential Feature Selection strategy for the FeatureSelector.
Added the sample_weight parameter to the score method.
New ways to initialize the data in the training instances.
The n_rows parameter in ATOMLoader is deprecated in favour of the new input formats.
The test_size parameter now also allows integer values.
Renamed categories to classes to be consistent with sklearn's API.
The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
Possibility to add custom parameters to an estimator's fit method through est_params.
The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
Bug fix where ATOMLoader wouldn't encode the target column during transformation.
Added the Deep learning, Ensembles and Utilities example notebooks.
Compatibility with python 3.9.

Version 4.1.0

New est_params parameter to customize the parameters in every model's estimator.
Following skopt's API, the n_random_starts parameter to specify the number of random trials is deprecated in favour of n_initial_points.
The Balancer class now allows you to use any of the strategies from imblearn.
New utility attributes to inspect the dataset.
Four new models: CatNB, CNB, ARD and RNN.
Added the models section to the documentation.
Small changes in log outputs.
Bug fixes and performance improvements.

Version 4.0.1

Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
Bug fix where subsequent runs with the same metric failed.
Added the license file to the package's installer.
Typo fixes in documentation.

Version 4.0.0

Bayesian optimization package changed from GpyOpt to skopt.
Complete revision of the model's hyperparameters.
Four SHAP plots can now be called directly from an ATOM pipeline.
Two new plots for regression tasks.
New plot_pipeline and pipeline attribute to access all transformers.
Possibility to determine transformer parameters per method.
New calibration method and plot.
Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
Implementation of multi-metric runs.
Possibility to choose which metric to plot.
Early stopping for models that allow in-training evaluation.
Added the ATOMLoader function to load any saved pickle instance.
The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
Implemented the DFS strategy in FeatureGenerator.
All training classes now inherit from BaseEstimator.
Added multiple new example notebooks.
Tests coverage up to 100%.
Completely new documentation page.
Bug fixes and performance improvements.