Skip to content

Release history


Version 4.14.1

  • Fixed an installation issue with conda.

Version 4.14.0

  • Refactor of the Cleaner and Vectorizer classes.
  • Refactor of the cross_validate method.
  • The plot_pipeline method now supports drawing multiple pipelines.
  • Renamed the Normalizer class to TextNormalizer.
  • Renamed the Gauss class to Normalizer.
  • Added the inverse_transform method to the Scaler, Normalizer and Cleaner classes.
  • Added the winners property to the trainers (note the extra s).
  • Added the feature_names_in_ and n_features_in_ attributes to transformers.
  • The default value of the warnings parameter is set to False.
  • Improvements for multicollinearity removal in FeatureSelector.
  • Renamed default feature names to x0, x1, etc... for consistency with sklearn's API.
  • Renamed component names in FeatureSelector to pca0, pca1, etc... for consistency with sklearn's API.
  • Significant speed up in pipeline transformations.
  • Fixed a bug where mlflow runs could be ended unexpectedly.

Version 4.13.1

  • Fixed an installation issue.

Version 4.13.0

  • Added GPU support. Read more in the user guide.
  • Added advanced feature selection strategies.
  • Added the return_sparse parameter to the Vectorizer class.
  • Added the quantile hyperparameter to the Dummy model.
  • The data attributes now return pandas objects where possible.
  • Fixed a bug where the BO could crash after balancing the data.
  • Fixed a bug where saving the FeatureGenerator class could fail for certain operators.
  • Fixed a bug where the FeatureSelector class displayed the wrong output.
  • Fixed a bug where the mapping attribute was not reordered.

Version 4.12.0

  • Support for Python 3.10.
  • New Discretizer class to bin numerical features.
  • Refactor of the FeatureGenerator class.
  • The mapping attribute now shows all encoded features.
  • Added the sample_weight parameter to the evaluate method.
  • ATOMClassifier has now a stratify parameter to split the data sets in a stratified fashion.
  • Possibility to exclude hyperparameters from the BO adding ! before the name.
  • Added memory usage to the stats method.
  • Fixed a bug where decision_plot could fail when only one row was plotted.
  • Added versioning to the documentation.

Version 4.11.0

  • Full support for sparse matrices. Read more in the user guide.
  • The shrink method now also handles sparse features.
  • Refactor of the distribution method.
  • Added three new linear models: Lars, Huber and Perc.
  • Dimensions can be shared across models using the key 'all' in ht_params["dimensions"].
  • Assign hyperparameters to tune using the predefined dimensions.
  • It's now possible to tune a custom number of layers for the MLP model.
  • If multiple BO calls share the best score, the one with the shortest training time is selected as winner (instead of the first).
  • Fixed a bug where the BO could fail when custom dimensions where defined.
  • Fixed a bug where FeatureSelector could fail after repeated calls to fit.
  • Fixed a bug where FeatureGenerator didn't pass the correct data indices to its output.
  • Performance improvements for the custom pipeline.
  • Minor documentation fixes.

Version 4.10.0

  • Added the holdout data set to have an extra way of assessing a model's performance on a completely independent dataset. Read more in the user_guide.
  • Complete rework of the ensemble models.
  • Support for dataframe indexing. Read more in the user guide.
  • New plot_parshap plot to detect overfitting features.
  • The new dashboard method makes analyzing the models even easier using a dashboard app.
  • The plot_feature_importance plot now also accepts estimators with coefficients.
  • Added the transform method for models.
  • Added the threshold parameter to the evaluate method.
  • The reset_predictions method is deprecated in favour of the new clear method.
  • Refactor of the model's full_train method.
  • The merge method is available for all trainers.
  • Improvements in the trainer's pipeline.
  • Training scores are now also saved to the mlflow run.
  • Trying to change the data in a branch after fitting a model with it now raises an exception.
  • Fixed a bug where the columns of array inputs were not ordered correctly.
  • Fixed a bug where branches did not correctly act case-insensitive.
  • Fixed a bug where the export_pipeline method for models would not export the transformers in the correct branch.

Version 4.9.1

Version 4.9.0

  • Drop support of Python 3.6.
  • Added the HistGBM model.
  • Improved print layout for hyperparameter tuning.
  • The new available_models method returns an overview of the available predefined models.
  • The calibrate and cross_validate methods can no longer be accessed from the trainers.
  • The pipeline parameter for the prediction methods is deprecated.
  • Improved visualization of the plot_rfecv, plot_successive_halving and plot_learning_curve methods.
  • Sparse matrices are now accepted as input.
  • Duplicate BO calls are no longer calculated.
  • Improvement in performance of the RNN model.
  • Refactor of the model's bo attribute.
  • Predefined hyperparameters have been updated to be consistent with sklearn's API.
  • Fixed a bug where custom scalers were ignored by the models.
  • Fixed a bug where the BO of certain models would crash with custom hyperparameters.
  • Fixed a bug where duplicate column names could be generated from a custom transformer.
  • Documentation improvements.

Version 4.8.0

  • The Encoder class now directly handles unknown categories encountered during fitting.
  • The Balancer and Encoder classes now accept custom estimators for the strategy parameter.
  • The new merge method enables the user to merge multiple atom instances into one.
  • The dtype shrinking is moved from atom's initializers to the shrink method.
  • ATOM's custom pipeline now handles transformers fitted on a subset of the dataset.
  • The column parameter in the distribution method is renamed to columns for continuity of the API.
  • The mae criterion for the GBM model hyperparameter tuning is deprecated to be consistent with sklearn's API.
  • Branches are now case-insensitive.
  • Renaming a branch using an existing name now raises an exception.
  • Fixed a bug where columns of type category broke the Imputer class.
  • Fixed a bug where predictions of the Stacking ensemble crashed for branches with multiple transformers.
  • The tables in the documentation now adapt to dark mode.

Version 4.7.3

  • Fixed a bug where the conda-forge recipe couldn't install properly.

Version 4.7.2

  • Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
  • Package requirements files are added to the installer.

Version 4.7.1

  • Fixed a bug where the pip installer failed.
  • Fixed a bug where categorical columns also selected datetime columns.

Version 4.7.0

  • Launched our new slack channel!
  • The new FeatureExtractor class extracts useful features from datetime columns.
  • The new plot_det method plots a binary classifier's detection error tradeoff curve.
  • The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
  • The full traceback of exceptions encountered during training are now saved to the logger.
  • ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
  • The scoring method is renamed to evaluate to clarify its purpose.
  • The column parameter in the apply method is renamed to columns for continuity of the API.
  • Minor documentation improvements.

Version 4.6.0

  • Added the full_train method to retrieve an estimator trained on the complete dataset.
  • The score method is now also able to calculate custom metrics on new data.
  • Refactor of the Imputer class.
  • Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
  • The clean method no longer automatically encodes the target column for regression tasks.
  • Creating a branch using a models' acronym as name now raises an exception.
  • Fixed a bug where CatBoost failed when early_stopping < 1.
  • Fixed a bug where created pipelines had duplicated names.

Version 4.5.0

  • Support of NLP pipelines. Read more in the user guide.
  • Integration of mlflow to track all models in the pipeline. Read more in the user guide.
  • The new Normalizer class transforms features to a more Gaussian-like distribution.
  • New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
  • New reset method to go back to atom's initial state.
  • Added the Dummy model to compare other models with a simple baseline.
  • New plot_wordcloud and plot_ngrams methods for text visualization.
  • Plots now can return the figure object when display=None.
  • The Pruner class can now able to drop outliers based on the selection of multiple strategies.
  • The new shuffle parameter in atom's initializer determines whether to shuffle the dataset.
  • The trainers no longer require you to specify a model using the models parameter. If left to default, all predefined models for that task are used.
  • The apply method now accepts args and kwargs for the function.
  • Refactor of the evaluate method.
  • Refactor of the export_pipeline method.
  • The parameters in the Cleaner class have been refactored to better describe their function.
  • The train_sizes parameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set.
  • Refactor of plot_pipeline to show models in the diagram as well.
  • Refactor of the bagging parameter to the (more appropriate) name n_bootstrap.
  • New option to exclude columns from a transformer adding ! before their name.
  • Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
  • Completely reworked documentation website.

Version 4.4.0

  • The drop method now allows the user to drop columns as part of the pipeline.
  • New apply method to perform data transformations as function to the pipeline
  • Added the status method to save an overview of atom's branches and models to the logger.
  • Improved the output messages for the Imputer class.
  • The dataset's columns can now be called directly from atom.
  • The distribution and plot_distribution methods now ignore missing values.
  • Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
  • Fixed a bug where the Cleaner class didn't drop columns consisting entirely of missing values when drop_min_cardinality=True.
  • Fixed a bug where the winning model wasn't displayed correctly.
  • Refactored the way transformers are added or removed from predicting methods.
  • Improved documentation.

Version 4.3.0

  • Possibility to add custom transformers to the pipeline.
  • The export_pipeline utility method exports atom's current pipeline to a sklearn object.
  • Use AutoML to automate the search for an optimized pipeline.
  • New magic methods makes atom behave similarly to sklearn's Pipeline.
  • All training approaches can now be combined in the same atom instance.
  • New plot_relationships, plot_distribution and plot_qq plots for data inspection.
  • Complete rework of all the shap plots to be consistent with their new API.
  • Improvements for the Scaler and Pruner classes.
  • The acronym for custom models now defaults to the capital letters in the class' __name__.
  • Possibility to apply transformations on only a subset of the columns.
  • Plots and methods now accept winner as model name.
  • Fixed a bug where custom metrics didn't show the correct name.
  • Fixed a bug where timers were not displayed correctly.
  • Further compatibility with deep learning datasets.
  • Large refactoring for performance optimization.
  • Cleaner output of messages to the logger.
  • Plots no longer show a default title.
  • Added the AutoML example notebook.
  • Minor bug fixes.

Version 4.2.1

  • Bug fix where there was memory leakage in successive halving and train sizing pipelines.
  • The XGBoost, LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models, e.g. pip install -U atom-ml[models].
  • Improved documentation.

Version 4.2.0

  • Possibility to add custom models to the pipeline using ATOMModel.
  • Compatibility with deep learning models.
  • New branch system for different data pipelines. Read more in the user guide.
  • Use the canvas contextmanager to draw multiple plots in one figure.
  • New voting and stacking ensemble techniques.
  • New get_class_weight utility method.
  • New Sequential Feature Selection strategy for the FeatureSelector.
  • Added the sample_weight parameter to the score method.
  • New ways to initialize the data in the training instances.
  • The n_rows parameter in ATOMLoader is deprecated in favour of the new input formats.
  • The test_size parameter now also allows integer values.
  • Renamed categories to classes to be consistent with sklearn's API.
  • The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
  • Possibility to add custom parameters to an estimator's fit method through est_params.
  • The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
  • Bug fix where ATOMLoader wouldn't encode the target column during transformation.
  • Added the Deep learning, Ensembles and Utilities example notebooks.
  • Support for python 3.9.

Version 4.1.0

  • New est_params parameter to customize the parameters in every model's estimator.
  • Following skopt's API, the n_random_starts parameter to specify the number of random trials is deprecated in favour of n_initial_points.
  • The Balancer class now allows you to use any of the strategies from imblearn.
  • New utility attributes to inspect the dataset.
  • Four new models: CatNB, CNB, ARD and RNN.
  • Added the models section to the documentation.
  • Small changes in log outputs.
  • Bug fixes and performance improvements.

Version 4.0.1

  • Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
  • Bug fix where subsequent runs with the same metric failed.
  • Added the license file to the package's installer.
  • Typo fixes in documentation.

Version 4.0.0

  • Bayesian optimization package changed from GpyOpt to skopt.
  • Complete revision of the model's hyperparameters.
  • Four SHAP plots can now be called directly from an ATOM pipeline.
  • Two new plots for regression tasks.
  • New plot_pipeline and pipeline attribute to access all transformers.
  • Possibility to determine transformer parameters per method.
  • New calibration method and plot.
  • Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
  • Implementation of multi-metric runs.
  • Possibility to choose which metric to plot.
  • Early stopping for models that allow in-training validation.
  • Added the ATOMLoader function to load any saved pickle instance.
  • The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
  • Implemented the dfs strategy in FeatureGenerator.
  • All training classes now inherit from BaseEstimator.
  • Added multiple new example notebooks.
  • Tests coverage up to 100%.
  • Completely new documentation page.
  • Bug fixes and performance improvements.