A Python package for fast exploration of machine learning pipelines

ATOM is an open-source, easy-to-use machine learning package for Python. ATOM is capable of running experiments quickly and efficiently, enabling the user to go from raw data to generating insights in just a few lines of code.

Read more Get started

Why you should use ATOM

  • Multiple data cleaning and feature engineering classes
  • 55+ classification, regression and forecast models to choose from
  • Possibility to train multiple models with one line of code
  • Fast implementation of hyperparameter tuning
  • Easy way to compare the results from different models
  • 50+ plots to analyze the data and model performance
  • Avoid refactoring to test new pipelines
  • Native support for GPU training
  • Integration with polars, pyspark and pyarrow
  • 30+ example notebooks to get you started
  • Full integration with multilabel and multioutput datasets
  • Native support for sparse datasets
  • Build-in transformers for NLP pipelines
  • Avoid endless imports and documentation lookups

Read our stories

tds

ATOM: A Python package for fast exploration of machine learning pipelines

During the exploration phase of a project, a data scientist tries to find the optimal pipeline for his specific use case. This usually involves applying standard data cleaning steps, creating or selecting useful features, trying out different models, etc...

Read more
tds

How to test multiple machine learning pipelines with just a few lines of Python

Since it's nearly impossible to know beforehand which transformations will benefit the model's outcome the most, this process usually involves trying out different approaches. For example, if we are dealing with an imbalanced dataset...

Read more
tds

From raw data to web app deployment with ATOM and Streamlit

In this article we will show you how to create a simple web app, capable of helping a data scientist to quickly perform a basic analysis on the performance of predictive models on a provided dataset. The user will be able to upload his own dataset and tweak the machine learning pipeline...

Read more
tds

Exploration of Deep Learning pipelines made easy

During the exploration phase of a project, a data scientist tries to find the optimal pipeline for his specific use case. In this story, I’ll explain how to use the ATOM package to quickly help you train and evaluate a deep learning model on any given dataset...

Read more
tds

Deep Feature Synthesis vs Genetic Feature Generation

Feature engineering is the process of creating new features from the existing ones, in order to capture relationships with the target column that the first set of features didn't have on their own. This process is very important to improve the performance of machine...

Read more
tds

From raw text to model prediction in under 50 lines of Python

Natural Language Processing (NLP) is the subfield of machine learning that works with human language data. Working with human text usually involves standard preprocessing steps such as data cleaning and converting the text to vectors of numbers...

Read more
tds

How to make 40+ interactive plots to analyze your machine learning pipeline

Plots have become the de facto tool to help data scientists and stakeholders understand the process and results of machine learning projects. In this story, we’ll show you how to use the ATOM library to easily make clean-looking, interactive plots...

Read more
tds

Machine learning on multioutput datasets: a quick guide

The standard machine learning tasks everyone is familiar with are classification (binary and multiclass) and regression. In these cases, there is one target column that we are trying to predict. In the multioutput case, there is more than one target column...

Read more
tds

Using MLflow with ATOM to track all your machine learning experiments

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning experiments and for later visualizing the results. In this story, we’ll explain how to use...

Read more
tds

Make your sklearn models up to 100 times faster

With the Intel® Extension for Scikit-learn package (or sklearnex, for brevity) you can accelerate sklearn models and transformers, keeping full conformance with sklearn’s API. Sklearnex is a free software AI accelerator that offers you a way to make sklearn code 10–100 times faster...

Read more
tds

Train your ML models on GPU changing just one line of code

Graphics Processing Units (GPUs) can significantly accelerate calculations for preprocessing steps or training machine learning models. Training models typically involves compute-intensive matrix multiplications and other operations that can take advantage of a GPU’s massively parallel architecture...

Read more