GPU
Graphics Processing Units (GPUs) can significantly accelerate calculations for preprocessing step or training machine learning models. Training models involves compute-intensive matrix multiplications and other operations that can take advantage of a GPU's massively parallel architecture. Training on large datasets can take hours to run on a single processor. However, if you offload those tasks to a GPUs, you can reduce training time to minutes instead.
Training transformers and models in atom using a GPU is as easy as
initializing the instance with parameter gpu=True
. The gpu
parameter
accepts three options:
- False: Always use CPU implementation.
- True: Use GPU implementation. If this results in an error, use CPU instead. When this happens, a message is written to the logger.
- "force": Use GPU implementation. If this results in an error, raise it.
ATOM uses cuML for all estimators except XGB, LGB and CatB, which come with their own GPU implementation. Check which prerequisites your machine needs for it to work, and which transformers and models are supported.
Be aware of the following:
- cuML estimators do not support sparse dataframes.
- cuML models sometimes use slightly different hyperparameters than their sklearn counterparts.
- cuML does not support multi-gpu training. If there is more than one
GPU on the machine, the first one is used by default. Use
CUDA_VISIBLE_DEVICES
to use any of the other GPUs.
Example
Train a model on a GPU yourself using Google Colab. Just click on
the badge above and follow the notebook. Note two things:
- Make sure you've been allocated a Tesla T4, P4, or P100. If this is not
the case (check it using
!nvidia-smi
), reset the runtime (Runtime -> Factory reset runtime) until you get one. - Setting up the environment and installing the necessary libraries may take quite some time (usually up to 15min).
Prerequisites
- Operating System:
- Ubuntu 18.04/20.04 or CentOS 7/8 with gcc/++ 9.0+
- Windows 8.1+ with WSL2 (see here a tutorial)
- GPU: NVIDIA Pascalâ„¢ or better with compute capability 6.0+
- CUDA & NVIDIA Drivers: One of versions 11.0, 11.2, 11.4 or 11.5
- cuML>=22.06
Transformers
- Scaler
- Cleaner (only for encode_target=True)
- Imputer (not for strat_num="knn")
- Discretizer (not for strategy="custom")
- Vectorizer
- FeatureSelector (only for strategy="pca")
Models
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Bernoulli Naive Bayes
- Categorical Naive Bayes
- Ordinary Least Squares
- Ridge (only for regression tasks)
- Lasso
- ElasticNet
- Lars
- Logistic Regression
- K-Nearest Neighbors
- Random Forest
- XGBoost
- LightGBM (requires extra installations)
- CatBoost
- Linear SVM
- Kernel SVM