ATOM: A Python package for fast exploration of machine learning pipelines
During the exploration phase of a project, a data scientist tries
to find the optimal pipeline for his specific use case. This usually
involves applying standard data cleaning steps, creating or selecting
useful features, trying out different models, etc...
How to test multiple machine learning pipelines with just a few lines of Python
Since it's nearly impossible to know beforehand which transformations
will benefit the model's outcome the most, this process usually involves
trying out different approaches. For example, if we are dealing with
an imbalanced dataset...
From raw data to web app deployment with ATOM and Streamlit
In this article we will show you how to create a simple web app,
capable of helping a data scientist to quickly perform a basic
analysis on the performance of predictive models on a provided
dataset. The user will be able to upload his own dataset and tweak
the machine learning pipeline...
Exploration of Deep Learning pipelines made easy
During the exploration phase of a project, a data scientist tries
to find the optimal pipeline for his specific use case. In this
story, I’ll explain how to use the ATOM package to quickly help you
train and evaluate a deep learning model on any given dataset...
Deep Feature Synthesis vs Genetic Feature Generation
Feature engineering is the process of creating new features from the
existing ones, in order to capture relationships with the target
column that the first set of features didn't have on their own. This
process is very important to improve the performance of machine...
From raw text to model prediction in under 50 lines of Python
Natural Language Processing (NLP) is the subfield of machine
learning that works with human language data. Working with human
text usually involves standard preprocessing steps such as data
cleaning and converting the text to vectors of numbers...
How to make 40+ interactive plots to analyze your machine learning pipeline
Plots have become the de facto tool to help data scientists and
stakeholders understand the process and results of machine
learning projects. In this story, we’ll show you how to use the
ATOM library to easily make clean-looking, interactive plots...
Machine learning on multioutput datasets: a quick guide
The standard machine learning tasks everyone is familiar with
are classification (binary and multiclass) and regression. In
these cases, there is one target column that we are trying to
predict. In the multioutput case, there is more than one target
column...
Using MLflow with ATOM to track all your machine learning experiments
The MLflow Tracking component is an API and UI for logging
parameters, code versions, metrics, and output files when
running your machine learning experiments and for later
visualizing the results. In this story, we’ll explain how
to use...
Make your sklearn models up to 100 times faster
With the Intel® Extension for Scikit-learn package (or sklearnex,
for brevity) you can accelerate sklearn models and transformers,
keeping full conformance with sklearn’s API. Sklearnex is a free
software AI accelerator that offers you a way to make sklearn
code 10–100 times faster...
Train your ML models on GPU changing just one line of code
Graphics Processing Units (GPUs) can significantly accelerate
calculations for preprocessing steps or training machine learning
models. Training models typically involves compute-intensive matrix
multiplications and other operations that can take advantage of a
GPU’s massively parallel architecture...