Release history
Version 4.14.1
- Fixed an installation issue with
conda
.
Version 4.14.0
- Refactor of the Cleaner and Vectorizer classes.
- Refactor of the cross_validate method.
- The plot_pipeline method now supports drawing multiple pipelines.
- Renamed the
Normalizer
class toTextNormalizer
. - Renamed the
Gauss
class toNormalizer
. - Added the
inverse_transform
method to the Scaler, Normalizer and Cleaner classes. - Added the
winners
property to the trainers (note the extras
). - Added the
feature_names_in_
andn_features_in_
attributes to transformers. - The default value of the
warnings
parameter is set to False. - Improvements for multicollinearity removal in FeatureSelector.
- Renamed default feature names to
x0
,x1
, etc... for consistency with sklearn's API. - Renamed component names in FeatureSelector
to
pca0
,pca1
, etc... for consistency with sklearn's API. - Significant speed up in pipeline transformations.
- Fixed a bug where mlflow runs could be ended unexpectedly.
Version 4.13.1
- Fixed an installation issue.
Version 4.13.0
- Added GPU support. Read more in the user guide.
- Added advanced feature selection strategies.
- Added the
return_sparse
parameter to the Vectorizer class. - Added the
quantile
hyperparameter to the Dummy model. - The data attributes now return pandas objects where possible.
- Fixed a bug where the BO could crash after balancing the data.
- Fixed a bug where saving the FeatureGenerator class could fail for certain operators.
- Fixed a bug where the FeatureSelector class displayed the wrong output.
- Fixed a bug where the
mapping
attribute was not reordered.
Version 4.12.0
- Support for Python 3.10.
- New Discretizer class to bin numerical features.
- Refactor of the FeatureGenerator class.
- The
mapping
attribute now shows all encoded features. - Added the
sample_weight
parameter to the evaluate method. - ATOMClassifier has now a
stratify
parameter to split the data sets in a stratified fashion. - Possibility to exclude hyperparameters from the BO adding
!
before the name. - Added memory usage to the stats method.
- Fixed a bug where decision_plot could fail when only one row was plotted.
- Added versioning to the documentation.
Version 4.11.0
- Full support for sparse matrices. Read more in the user guide.
- The shrink method now also handles sparse features.
- Refactor of the distribution method.
- Added three new linear models: Lars, Huber and Perc.
- Dimensions can be shared across models using the key 'all' in
ht_params["dimensions"]
. - Assign hyperparameters to tune using the predefined dimensions.
- It's now possible to tune a custom number of layers for the MLP model.
- If multiple BO calls share the best score, the one with the shortest training time is selected as winner (instead of the first).
- Fixed a bug where the BO could fail when custom dimensions where defined.
- Fixed a bug where FeatureSelector could fail after repeated calls to fit.
- Fixed a bug where FeatureGenerator didn't pass the correct data indices to its output.
- Performance improvements for the custom pipeline.
- Minor documentation fixes.
Version 4.10.0
- Added the
holdout
data set to have an extra way of assessing a model's performance on a completely independent dataset. Read more in the user_guide. - Complete rework of the ensemble models.
- Support for dataframe indexing. Read more in the user guide.
- New plot_parshap plot to detect overfitting features.
- The new dashboard method makes analyzing the models even easier using a dashboard app.
- The plot_feature_importance plot now also accepts estimators with coefficients.
- Added the transform method for models.
- Added the
threshold
parameter to the evaluate method. - The
reset_predictions
method is deprecated in favour of the new clear method. - Refactor of the model's full_train method.
- The merge method is available for all trainers.
- Improvements in the trainer's pipeline.
- Training scores are now also saved to the mlflow run.
- Trying to change the data in a branch after fitting a model with it now raises an exception.
- Fixed a bug where the columns of array inputs were not ordered correctly.
- Fixed a bug where branches did not correctly act case-insensitive.
- Fixed a bug where the export_pipeline method for models would not export the transformers in the correct branch.
Version 4.9.1
- Changed the default cross-validation for hyperparameter tuning from 5 to 1 to avoid errors with deep learning models.
- Added clearer exception messages when a model's run failed.
- Fixed a bug where custom dimensions didn't show during hyperparameter tuning.
- Documentation improvements.
Version 4.9.0
- Drop support of Python 3.6.
- Added the HistGBM model.
- Improved print layout for hyperparameter tuning.
- The new available_models method returns an overview of the available predefined models.
- The calibrate and cross_validate methods can no longer be accessed from the trainers.
- The
pipeline
parameter for the prediction methods is deprecated. - Improved visualization of the plot_rfecv, plot_successive_halving and plot_learning_curve methods.
- Sparse matrices are now accepted as input.
- Duplicate BO calls are no longer calculated.
- Improvement in performance of the RNN model.
- Refactor of the model's
bo
attribute. - Predefined hyperparameters have been updated to be consistent with sklearn's API.
- Fixed a bug where custom scalers were ignored by the models.
- Fixed a bug where the BO of certain models would crash with custom hyperparameters.
- Fixed a bug where duplicate column names could be generated from a custom transformer.
- Documentation improvements.
Version 4.8.0
- The Encoder class now directly handles unknown categories encountered during fitting.
- The Balancer and Encoder
classes now accept custom estimators for the
strategy
parameter. - The new merge method enables the user to merge multiple atom instances into one.
- The dtype shrinking is moved from atom's initializers to the shrink method.
- ATOM's custom pipeline now handles transformers fitted on a subset of the dataset.
- The
column
parameter in the distribution method is renamed tocolumns
for continuity of the API. - The
mae
criterion for the GBM model hyperparameter tuning is deprecated to be consistent with sklearn's API. - Branches are now case-insensitive.
- Renaming a branch using an existing name now raises an exception.
- Fixed a bug where columns of type
category
broke the Imputer class. - Fixed a bug where predictions of the Stacking ensemble crashed for branches with multiple transformers.
- The tables in the documentation now adapt to dark mode.
Version 4.7.3
- Fixed a bug where the conda-forge recipe couldn't install properly.
Version 4.7.2
- Fixed a bug where the pipeline failed for custom transformers that returned sparse matrices.
- Package requirements files are added to the installer.
Version 4.7.1
- Fixed a bug where the pip installer failed.
- Fixed a bug where categorical columns also selected datetime columns.
Version 4.7.0
- Launched our new slack channel!
- The new FeatureExtractor class extracts useful features from datetime columns.
- The new plot_det method plots a binary classifier's detection error tradeoff curve.
- The partial dependence plot is able to draw Individual Conditional Expectation (ICE) lines.
- The full traceback of exceptions encountered during training are now saved to the logger.
- ATOMClassifier and ATOMRegressor now convert the dtypes of the input data to the minimal allowed type for memory efficiency.
- The scoring method is renamed to evaluate to clarify its purpose.
- The
column
parameter in the apply method is renamed tocolumns
for continuity of the API. - Minor documentation improvements.
Version 4.6.0
- Added the full_train method to retrieve an estimator trained on the complete dataset.
- The score method is now also able to calculate custom metrics on new data.
- Refactor of the Imputer class.
- Refactor of the Encoder class to avoid errors for unknown classes and allow the input of missing values.
- The clean method no longer automatically encodes the target column for regression tasks.
- Creating a branch using a models' acronym as name now raises an exception.
- Fixed a bug where CatBoost failed when
early_stopping
< 1. - Fixed a bug where created pipelines had duplicated names.
Version 4.5.0
- Support of NLP pipelines. Read more in the user guide.
- Integration of mlflow to track all models in the pipeline. Read more in the user guide.
- The new Normalizer class transforms features to a more Gaussian-like distribution.
- New cross_validate method to evaluate the robustness of a pipeline using cross_validation.
- New reset method to go back to atom's initial state.
- Added the Dummy model to compare other models with a simple baseline.
- New plot_wordcloud and plot_ngrams methods for text visualization.
- Plots now can return the figure object when
display=None
. - The Pruner class can now able to drop outliers based on the selection of multiple strategies.
- The new
shuffle
parameter in atom's initializer determines whether to shuffle the dataset. - The trainers no longer require you to specify a model using the
models
parameter. If left to default, all predefined models for that task are used. - The apply method now accepts args and kwargs for the function.
- Refactor of the evaluate method.
- Refactor of the export_pipeline method.
- The parameters in the Cleaner class have been refactored to better describe their function.
- The
train_sizes
parameter in train_sizing now accepts integer values to automatically create equally distributed splits in the training set. - Refactor of plot_pipeline to show models in the diagram as well.
- Refactor of the
bagging
parameter to the (more appropriate) namen_bootstrap
. - New option to exclude columns from a transformer adding
!
before their name. - Fixed a bug where the Pruner class failed if there were categorical columns in the dataset.
- Completely reworked documentation website.
Version 4.4.0
- The drop method now allows the user to drop columns as part of the pipeline.
- New apply method to perform data transformations as function to the pipeline
- Added the status method to save an overview of atom's branches and models to the logger.
- Improved the output messages for the Imputer class.
- The dataset's columns can now be called directly from atom.
- The distribution and plot_distribution methods now ignore missing values.
- Fixed a bug where transformations could fail when columns were added to the dataset after initializing the pipeline.
- Fixed a bug where the Cleaner class didn't drop
columns consisting entirely of missing values when
drop_min_cardinality=True
. - Fixed a bug where the winning model wasn't displayed correctly.
- Refactored the way transformers are added or removed from predicting methods.
- Improved documentation.
Version 4.3.0
- Possibility to add custom transformers to the pipeline.
- The export_pipeline utility method exports atom's current pipeline to a sklearn object.
- Use AutoML to automate the search for an optimized pipeline.
- New magic methods makes atom behave similarly to sklearn's Pipeline.
- All training approaches can now be combined in the same atom instance.
- New plot_relationships, plot_distribution and plot_qq plots for data inspection.
- Complete rework of all the shap plots to be consistent with their new API.
- Improvements for the Scaler and Pruner classes.
- The acronym for custom models now defaults to the capital letters in the class' __name__.
- Possibility to apply transformations on only a subset of the columns.
- Plots and methods now accept
winner
as model name. - Fixed a bug where custom metrics didn't show the correct name.
- Fixed a bug where timers were not displayed correctly.
- Further compatibility with deep learning datasets.
- Large refactoring for performance optimization.
- Cleaner output of messages to the logger.
- Plots no longer show a default title.
- Added the AutoML example notebook.
- Minor bug fixes.
Version 4.2.1
- Bug fix where there was memory leakage in successive halving and train sizing pipelines.
- The XGBoost,
LightGBM and
CatBoost packages can now be installed through the installer's
extras_require under the name
models
, e.g.pip install -U atom-ml[models]
. - Improved documentation.
Version 4.2.0
- Possibility to add custom models to the pipeline using ATOMModel.
- Compatibility with deep learning models.
- New branch system for different data pipelines. Read more in the user guide.
- Use the canvas contextmanager to draw multiple plots in one figure.
- New voting and stacking ensemble techniques.
- New get_class_weight utility method.
- New Sequential Feature Selection strategy for the FeatureSelector.
- Added the
sample_weight
parameter to the score method. - New ways to initialize the data in the
training
instances. - The
n_rows
parameter in ATOMLoader is deprecated in favour of the new input formats. - The
test_size
parameter now also allows integer values. - Renamed categories to classes to be consistent with sklearn's API.
- The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
- Possibility to add custom parameters to an estimator's fit method through
est_params
. - The successive halving and train sizing approaches now both allow subsequent runs from atom without losing the information from previous runs.
- Bug fix where ATOMLoader wouldn't encode the target column during transformation.
- Added the Deep learning, Ensembles and Utilities example notebooks.
- Support for python 3.9.
Version 4.1.0
- New
est_params
parameter to customize the parameters in every model's estimator. - Following skopt's API, the
n_random_starts
parameter to specify the number of random trials is deprecated in favour ofn_initial_points
. - The Balancer class now allows you to use any of the strategies from imblearn.
- New utility attributes to inspect the dataset.
- Four new models: CatNB, CNB, ARD and RNN.
- Added the models section to the documentation.
- Small changes in log outputs.
- Bug fixes and performance improvements.
Version 4.0.1
- Bug fix where the FeatureGenerator was not deterministic for a fixed random state.
- Bug fix where subsequent runs with the same metric failed.
- Added the license file to the package's installer.
- Typo fixes in documentation.
Version 4.0.0
- Bayesian optimization package changed from GpyOpt to skopt.
- Complete revision of the model's hyperparameters.
- Four SHAP plots can now be called directly from an ATOM pipeline.
- Two new plots for regression tasks.
- New plot_pipeline and
pipeline
attribute to access all transformers. - Possibility to determine transformer parameters per method.
- New calibration method and plot.
- Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
- Implementation of multi-metric runs.
- Possibility to choose which metric to plot.
- Early stopping for models that allow in-training validation.
- Added the ATOMLoader function to load any saved pickle instance.
- The "remove" strategy in the data cleaning parameters is deprecated in favour of "drop".
- Implemented the dfs strategy in FeatureGenerator.
- All training classes now inherit from BaseEstimator.
- Added multiple new example notebooks.
- Tests coverage up to 100%.
- Completely new documentation page.
- Bug fixes and performance improvements.