Release history
Version 4.11.0
- Full support for sparse matrices. Read more in the user guide.
 - The shrink method now also handles sparse features.
 - Refactor of the distribution method.
 - Added three new linear models: Lars, Huber and Perc.
 - Dimensions can be shared across models using the key 'all' in 
bo_params["dimensions"]. - Assign hyperparameters to tune using the predefined dimensions.
 - It's now possible to tune a custom number of layers for the MLP model.
 - If multiple BO calls share the best score, the one with the shortest training time is selected as winner (instead of the first).
 - Fixed a bug where the BO could fail when custom dimensions where defined.
 - Fixed a bug where FeatureSelector could fail after repeated calls to fit.
 - Fixed a bug where FeatureGenerator didn't pass the correct data indices to its output.
 - Performance improvements for the custom pipeline.
 - Minor documentation fixes.
 
Version 4.10.0
- Added the 
holdoutdata set to have an extra way of assessing a model's performance on a completely independent dataset. Read more in the user_guide. - Complete rework of the ensemble models.
 - Support for dataframe indexing. Read more in the user guide.
 - New plot_parshap plot to detect overfitting features.
 - The new dashboard method makes analyzing the models even easier using a dashboard app.
 - The plot_feature_importance plot now also accepts estimators with coefficients.
 - Added the transform method for models.
 - Added the 
thresholdparameter to the evaluate method. - The 
reset_predictionsmethod is deprecated in favour of the new clear method. - Refactor of the model's full_train method.
 - The merge method is available for all trainers.
 - Improvements in the trainer's pipeline.
 - Training scores are now also saved to the mlflow run.
 - Trying to change the data in a branch after fitting a model with it now raises an exception.
 - Fixed a bug where the columns of array inputs were not ordered correctly.
 - Fixed a bug where branches did not correctly act case-insensitive.
 - Fixed a bug where the export_pipeline method for models would not export the transformers in the correct branch.
 
Version 4.9.1
- Changed the default cross-validation for hyperparameter tuning from 5 to 1 to avoid errors with deep learning models.
 - Added clearer exception messages when a model's run failed.
 - Fixed a bug where custom dimensions didn't show during hyperparameter tuning.
 - Documentation improvements.
 
Version 4.9.0
- Drop support of Python 3.6.
 - Added the HistGBM model.
 - Improved print layout for hyperparameter tuning.
 - The new available_models method returns an overview of the available predefined models.
 - The calibrate and cross_validate methods can no longer be accessed from the trainers.
 - The 
pipelineparameter for the prediction methods is deprecated. - Improved visualization of the plot_rfecv, plot_successive_halving and plot_learning_curve methods.
 - Sparse matrices are now accepted as input.
 - Duplicate BO calls are no longer calculated.
 - Improvement in performance of the RNN model.
 - Refactor of the model's 
boattribute. - Predefined hyperparameters have been updated to be consistent with sklearn's API.
 - Fixed a bug where custom scalers were ignored by the models.
 - Fixed a bug where the BO of certain models would crash with custom hyperparameters.
 - Fixed a bug where duplicate column names could be generated from a custom transformer.
 - Documentation improvements.
 
Version 4.8.0
- The Encoder class now directly handles unknown categories encountered during fitting.
 - The Balancer and Encoder
  classes now accept custom estimators for the 
strategyparameter. - The new merge method enables the user to merge multiple atom instances into one.
 - The dtype shrinking is moved from atom's initializers to the shrink method.
 - ATOM's custom pipeline now handles transformers fitted on a subset of the dataset.
 - The 
columnparameter in the distribution method is renamed tocolumnsfor continuity of the API. - The 
maecriterion for the GBM model hyperparameter tuning is deprecated to be consistent with sklearn's API. - Branches are now case-insensitive.
 - Renaming a branch using an existing name now raises an exception.
 - Fixed a bug where columns of type 
categorybroke the Imputer class. - Fixed a bug where predictions of the Stacking ensemble crashed for branches with multiple transformers.
 - The tables in the documentation now adapt to dark mode.
 
Version 4.7.3
- Fixed a bug where the conda-forge recipe couldn't install properly.
 
Version 4.7.2
- Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
 - Package requirements files are added to the installer.
 
Version 4.7.1
- Fixed a bug where the pip installer failed.
 - Fixed a bug where categorical columns also selected datetime columns.
 
Version 4.7.0
- Launched our new slack channel!
 - The new FeatureExtractor class extracts useful features from datetime columns.
 - The new plot_det method plots a binary classifier's detection error tradeoff curve.
 - The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
 - The full traceback of exceptions encountered during training are now saved to the logger.
 - ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
 - The scoring method is renamed to evaluate to clarify its purpose.
 - The 
columnparameter in the apply method is renamed tocolumnsfor continuity of the API. - Minor documentation improvements.
 
Version 4.6.0
- Added the full_train method to retrieve an estimator trained on the complete dataset.
 - The score method is now also able to calculate custom metrics on new data.
 - Refactor of the Imputer class.
 - Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
 - The clean method no longer automatically encodes the target column for regression tasks.
 - Creating a branch using a models' acronym as name now raises an exception.
 - Fixed a bug where CatBoost failed when 
early_stopping< 1. - Fixed a bug where created pipelines had duplicated names.
 
Version 4.5.0
- Support of NLP pipelines. Read more in the user guide.
 - Integration of mlflow to track all models in the pipeline. Read more in the user guide.
 - The new Gauss class transforms features to a more Gaussian-like distribution.
 - New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
 - New reset method to go back to atom's initial state.
 - Added the Dummy model to compare other models with a simple baseline.
 - New plot_wordcloud and plot_ngrams methods for text visualization.
 - Plots now can return the figure object when 
display=None. - The Pruner class can now able to drop outliers based on the selection of multiple strategies.
 - The new 
shuffleparameter in atom's initializer determines whether to shuffle the dataset. - The trainers no longer require you to specify a model using the 
modelsparameter. If left to default, all predefined models for that task are used. - The apply method now accepts args and kwargs for the function.
 - Refactor of the evaluate method.
 - Refactor of the export_pipeline method.
 - The parameters in the Cleaner class have been refactored to better describe their function.
 - The 
train_sizesparameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set. - Refactor of plot_pipeline to show models in the diagram as well.
 - Refactor of the 
baggingparameter to the (more appropriate) namen_bootstrap. - New option to exclude columns from a transformer adding 
!before their name. - Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
 - Completely reworked documentation website.
 
Version 4.4.0
- The drop method now allows the user to drop columns as part of the pipeline.
 - New apply method to perform data transformations as function to the pipeline
 - Added the status method to save an overview of atom's branches and models to the logger.
 - Improved the output messages for the Imputer class.
 - The dataset's columns can now be called directly from atom.
 - The distribution and plot_distribution methods now ignore missing values.
 - Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
 - Fixed a bug where the Cleaner class didn't drop
  columns consisting entirely of missing values when 
drop_min_cardinality=True. - Fixed a bug where the winning model wasn't displayed correctly.
 - Refactored the way transformers are added or removed from predicting methods.
 - Improved documentation.
 
Version 4.3.0
- Possibility to add custom transformers to the pipeline.
 - The export_pipeline utility method exports atom's current pipeline to a sklearn object.
 - Use AutoML to automate the search for an optimized pipeline.
 - New magic methods makes atom behave similarly to sklearn's Pipeline.
 - All training approaches can now be combined in the same atom instance.
 - New plot_scatter_matrix, plot_distribution and plot_qq plots for data inspection.
 - Complete rework of all the shap plots to be consistent with their new API.
 - Improvements for the Scaler and Pruner classes.
 - The acronym for custom models now defaults to the capital letters in the class' __name__.
 - Possibility to apply transformations on only a subset of the columns.
 - Plots and methods now accept 
winneras model name. - Fixed a bug where custom metrics didn't show the correct name.
 - Fixed a bug where timers were not displayed correctly.
 - Further compatibility with deep learning datasets.
 - Large refactoring for performance optimization.
 - Cleaner output of messages to the logger.
 - Plots no longer show a default title.
 - Added the AutoML example notebook.
 - Minor bug fixes.
 
Version 4.2.1
- Bug fix where there was memory leakage in successive halving and train sizing pipelines.
 - The XGBoost,
  LightGBM and
  CatBoost packages can now be installed through the installer's
  extras_require under the name 
models, e.g.pip install -U atom-ml[models]. - Improved documentation.
 
Version 4.2.0
- Possibility to add custom models to the pipeline using ATOMModel.
 - Compatibility with deep learning models.
 - New branch system for different data pipelines. Read more in the user guide.
 - Use the canvas contextmanager to draw multiple plots in one figure.
 - New voting and stacking ensemble techniques.
 - New get_class_weight utility method.
 - New Sequential Feature Selection strategy for the FeatureSelector.
 - Added the 
sample_weightparameter to the score method. - New ways to initialize the data in the 
traininginstances. - The 
n_rowsparameter in ATOMLoader is deprecated in favour of the new input formats. - The 
test_sizeparameter now also allows integer values. - Renamed categories to classes to be consistent with sklearn's API.
 - The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
 - Possibility to add custom parameters to an estimator's fit method through 
est_params. - The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
 - Bug fix where ATOMLoader wouldn't encode the target column during transformation.
 - Added the Deep learning, Ensembles and Utilities example notebooks.
 - Compatibility with python 3.9.
 
Version 4.1.0
- New 
est_paramsparameter to customize the parameters in every model's estimator. - Following skopt's API, the 
n_random_startsparameter to specify the number of random trials is deprecated in favour ofn_initial_points. - The Balancer class now allows you to use any of the strategies from imblearn.
 - New utility attributes to inspect the dataset.
 - Four new models: CatNB, CNB, ARD and RNN.
 - Added the models section to the documentation.
 - Small changes in log outputs.
 - Bug fixes and performance improvements.
 
Version 4.0.1
- Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
 - Bug fix where subsequent runs with the same metric failed.
 - Added the license file to the package's installer.
 - Typo fixes in documentation.
 
Version 4.0.0
- Bayesian optimization package changed from GpyOpt to skopt.
 - Complete revision of the model's hyperparameters.
 - Four SHAP plots can now be called directly from an ATOM pipeline.
 - Two new plots for regression tasks.
 - New plot_pipeline and 
pipelineattribute to access all transformers. - Possibility to determine transformer parameters per method.
 - New calibration method and plot.
 - Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
 - Implementation of multi-metric runs.
 - Possibility to choose which metric to plot.
 - Early stopping for models that allow in-training evaluation.
 - Added the ATOMLoader function to load any saved pickle instance.
 - The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
 - Implemented the DFS strategy in FeatureGenerator.
 - All training classes now inherit from BaseEstimator.
 - Added multiple new example notebooks.
 - Tests coverage up to 100%.
 - Completely new documentation page.
 - Bug fixes and performance improvements.