Release history
Version 4.8.0
- The Encoder class now directly handles unknown categories encountered during fitting.
- The Balancer and Encoder
classes now accept custom estimators for the
strategy
parameter. - The new merge method enables the user to merge multiple atom instances into one.
- The dtype shrinking is moved from atom's initializers to the shrink method.
- ATOM's custom pipeline now handles transformers fitted on a subset of the dataset.
- The
column
parameter in the distribution method is renamed tocolumns
for continuity of the API. - The
mae
criterion for the GBM model hyperparameter tuning is deprecated to be consistent with sklearn's API. - Branches are now case-insensitive.
- Renaming a branch using an existing name now raises an exception.
- Fixed a bug where columns of type
category
broke the Imputer class. - Fixed a bug where predictions of the Stacking ensemble crashed for branches with multiple transformers.
- The tables in the documentation now adapt to dark mode.
Version 4.7.3
- Fixed a bug where the conda-forge recipe couldn't install properly.
Version 4.7.2
- Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
- Package requirements files are added to the installer.
Version 4.7.1
- Fixed a bug where the pip installer failed.
- Fixed a bug where categorical columns also selected datetime columns.
Version 4.7.0
- Launched our new slack channel!
- The new FeatureExtractor class extracts useful features from datetime columns.
- The new plot_det method plots a binary classifier's detection error tradeoff curve.
- The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
- The full traceback of exceptions encountered during training are now saved to the logger.
- ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
- The scoring method is renamed to evaluate to clarify its purpose.
- The
column
parameter in the apply method is renamed tocolumns
for continuity of the API. - Minor documentation improvements.
Version 4.6.0
- Added the full_train method to retrieve an estimator trained on the complete dataset.
- The score method is now also able to calculate custom metrics on new data.
- Refactor of the Imputer class.
- Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
- The clean method no longer automatically encodes the target column for regression tasks.
- Creating a branch using a models' acronym as name now raises an exception.
- Fixed a bug where CatBoost failed when
early_stopping
< 1. - Fixed a bug where created pipelines had duplicated names.
Version 4.5.0
- Support of NLP pipelines. Read more in the user guide.
- Integration of mlflow to track all models in the pipeline. Read more in the user guide.
- The new Gauss class transforms features to a more Gaussian-like distribution.
- New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
- New reset method to go back to atom's initial state.
- Added the Dummy model to compare other models with a simple baseline.
- New plot_wordcloud and plot_ngrams methods for text visualization.
- Plots now can return the figure object when
display=None
. - The Pruner class can now able to drop outliers based on the selection of multiple strategies.
- The new
shuffle
parameter in atom's initializer determines whether to shuffle the dataset. - The trainers no longer require you to specify a model using the
models
parameter. If left to default, all predefined models for that task are used. - The apply method now accepts args and kwargs for the function.
- Refactor of the evaluate method.
- Refactor of the export_pipeline method.
- The parameters in the Cleaner class have been refactored to better describe their function.
- The
train_sizes
parameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set. - Refactor of plot_pipeline to show models in the diagram as well.
- Refactor of the
bagging
parameter to the (more appropriate) namen_bootstrap
. - New option to exclude columns from a transformer adding
!
before their name. - Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
- Completely reworked documentation website.
Version 4.4.0
- The drop method now allows the user to drop columns as part of the pipeline.
- New apply method to perform data transformations as function to the pipeline
- Added the status method to save an overview of atom's branches and models to the logger.
- Improved the output messages for the Imputer class.
- The dataset's columns can now be called directly from atom.
- The distribution and plot_distribution methods now ignore missing values.
- Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
- Fixed a bug where the Cleaner class didn't drop
columns consisting entirely of missing values when
drop_min_cardinality=True
. - Fixed a bug where the winning model wasn't displayed correctly.
- Refactored the way transformers are added or removed from predicting methods.
- Improved documentation.
Version 4.3.0
- Possibility to add custom transformers to the pipeline.
- The export_pipeline utility method exports atom's current pipeline to a sklearn object.
- Use AutoML to automate the search for an optimized pipeline.
- New magic methods makes atom behave similarly to sklearn's Pipeline.
- All training approaches can now be combined in the same atom instance.
- New plot_scatter_matrix, plot_distribution and plot_qq plots for data inspection.
- Complete rework of all the shap plots to be consistent with their new API.
- Improvements for the Scaler and Pruner classes.
- The acronym for custom models now defaults to the capital letters in the class' __name__.
- Possibility to apply transformations on only a subset of the columns.
- Plots and methods now accept
winner
as model name. - Fixed a bug where custom metrics didn't show the correct name.
- Fixed a bug where timers were not displayed correctly.
- Further compatibility with deep learning datasets.
- Large refactoring for performance optimization.
- Cleaner output of messages to the logger.
- Plots no longer show a default title.
- Added the AutoML example notebook.
- Minor bug fixes.
Version 4.2.1
- Bug fix where there was memory leakage in successive halving and train sizing pipelines.
- The XGBoost,
LightGBM and
CatBoost packages can now be installed through the installer's
extras_require under the name
models
, e.g.pip install -U atom-ml[models]
. - Improved documentation.
Version 4.2.0
- Possibility to add custom models to the pipeline using ATOMModel.
- Compatibility with deep learning models.
- New branch system for different data pipelines. Read more in the user guide.
- Use the canvas contextmanager to draw multiple plots in one figure.
- New voting and stacking ensemble techniques.
- New get_class_weight utility method.
- New Sequential Feature Selection strategy for the FeatureSelector.
- Added the
sample_weight
parameter to the score method. - New ways to initialize the data in the
training
instances. - The
n_rows
parameter in ATOMLoader is deprecated in favour of the new input formats. - The
test_size
parameter now also allows integer values. - Renamed categories to classes to be consistent with sklearn's API.
- The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
- Possibility to add custom parameters to an estimator's fit method through
est_params
. - The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
- Bug fix where ATOMLoader wouldn't encode the target column during transformation.
- Added the Deep learning, Ensembles and Utilities example notebooks.
- Compatibility with python 3.9.
Version 4.1.0
- New
est_params
parameter to customize the parameters in every model's estimator. - Following skopt's API, the
n_random_starts
parameter to specify the number of random trials is deprecated in favour ofn_initial_points
. - The Balancer class now allows you to use any of the strategies from imblearn.
- New utility attributes to inspect the dataset.
- Four new models: CatNB, CNB, ARD and RNN.
- Added the models section to the documentation.
- Small changes in log outputs.
- Bug fixes and performance improvements.
Version 4.0.1
- Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
- Bug fix where subsequent runs with the same metric failed.
- Added the license file to the package's installer.
- Typo fixes in documentation.
Version 4.0.0
- Bayesian optimization package changed from GpyOpt to skopt.
- Complete revision of the model's hyperparameters.
- Four SHAP plots can now be called directly from an ATOM pipeline.
- Two new plots for regression tasks.
- New plot_pipeline and
pipeline
attribute to access all transformers. - Possibility to determine transformer parameters per method.
- New calibration method and plot.
- Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
- Implementation of multi-metric runs.
- Possibility to choose which metric to plot.
- Early stopping for models that allow in-training evaluation.
- Added the ATOMLoader function to load any saved pickle instance.
- The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
- Implemented the DFS strategy in FeatureGenerator.
- All training classes now inherit from BaseEstimator.
- Added multiple new example notebooks.
- Tests coverage up to 100%.
- Completely new documentation page.
- Bug fixes and performance improvements.