Frequently asked questions
Here we try to give answers to some questions that have popped up regularly. If you have any other questions, don't hesitate to create a new discussion or post them on the Slack channel!
??? faq Is this package related to the Atom text editor?" There is, indeed, a text editor with the same name and a similar logo as this package. Is this a shameless copy? No. When I started the project, I didn't know about the text editor, and it doesn't require much thinking to come up with the idea of replacing the letter O of the word atom with the image of an atom.
How does ATOM relate to AutoML?
ATOM is not an AutoML tool since it does not automate the search for an optimal pipeline like well-known AutoML tools such as auto-sklearn or EvalML do. Instead, ATOM helps the user find the optimal pipeline himself. One of the goals of this package is to help data scientists produce explainable pipelines, and using an AutoML black box function would impede that.
Is it possible to run deep learning models?
Yes. Deep learning models can be added as custom models to the pipeline as long as they follow sklearn's API. For more information, see the deep learning section of the user guide.
Can I run atom's methods on just a subset of the columns?
Yes, all data cleaning and feature engineering methods accept a
columns
parameter to only transform the selected features. For example,
to only impute the numerical columns in the dataset we could type
atom.impute(strat_num="mean", columns=atom.numerical)
. The parameter
accepts column names, column indices, dtypes or a slice object.
How can I compare the same model on different datasets?
In many occasions you might want to test how a model performs on datasets processed with different pipelines. For this, atom has the branch system. Create a new branch for every new pipeline you want to test and use the plot methods to compare all models, independent of the branch it was trained on.
Can I train models through atom using a GPU?
Yes. Refer to the user guide to see what algorithms and models have a GPU implementation. Be aware that it could require additional software and hardware dependencies.
How are numerical and categorical columns differentiated?
The columns are separated using a dataframe's select_dtypes
method. Numerical columns are selected using include="number"
whereas categorical columns are selected using exclude="number"
.
Can I run unsupervised learning pipelines?
No. As for now, ATOM only supports supervised machine learning pipelines. However, various unsupervised algorithms can be chosen as strategy in the Pruner class to detect and remove outliers from the dataset.
Is there a way to plot multiple models in the same shap plot?
No. Unfortunately, there is no way to plot multiple models in the same
shap plot since the plots are made by the shap
package and passed as matplotlib.axes
objects to atom. This means
that it's not within the reach of this package to implement such a
utility.
Can I merge a sklearn pipeline with atom?
Yes. Like any other transformer, it is possible to add a sklearn
pipeline to atom using the add method. Every
transformer in the pipeline is merged independently. The pipeline is
not allowed to end with a model since atom manages its own models.
If that is the case, add the pipeline using atom.add(pipeline[:-1])
.
Is it possible to initialize atom with an existing train and test set?
Yes. If you already have a separated train and test set you can initialize atom in two ways:
atom = ATOMClassifier(train, test)
atom = ATOMClassifier((X_train, y_train), (X_test, y_test))
Make sure the train and test size have the same number of columns! If
atom is initialized in any of these two ways, the test_size
parameter
is ignored.
Can I train the models using cross-validation?
Applying cross-validation means transforming every step of the pipeline multiple times, each with different results. Doing this would prevent ATOM from being able to show the transformation results after every pre-processing step, which means losing the ability to inspect how a transformer changed the dataset. For this reason, it is not possible to apply cross-validation until after a model has been trained. After a model has been trained, the pipeline is defined, and cross-validation can be applied using the cross_validate method. See here an example using cross-validation.
Is there a way to process datetime features?
Yes, the FeatureExtractor class can automatically extract useful features (day, month, year, etc...) from datetime columns. The extracted features are always encoded to numerical values, so they can be fed directly to a model.