Skip to content

Accelerating pipelines


CPU acceleration

ATOM uses sklearnex to accelerate sklearn applications and still have full conformance with its API. This tool can bring over 10-100X acceleration across a variety of transformers and models. See here an example.

Prerequisites

  • Operating System:
    • Linux (Ubuntu, Fedora, etc...)
    • Windows 8.1+
    • macOS
  • CPU:
    • Processor must have x86 architecture.
    • Processor must support at least one of SSE2, AVX, AVX2, AVX512 instruction sets.
    • ARM* architecture is not supported.
  • Libraries:
    • sklearnex>=2021.6.3 (automatically installed with atom)

Note

Intel® processors provide better performance than other CPUs.


Supported estimators

Transformers

Models



GPU acceleration

Graphics Processing Units (GPUs) can significantly accelerate calculations for preprocessing step or training machine learning models. Training models involves compute-intensive matrix multiplications and other operations that can take advantage of a GPU's massively parallel architecture. Training on large datasets can take hours to run on a single processor. However, if you offload those tasks to a GPU, you can reduce training time to minutes instead.

Training transformers and models in atom using a GPU is as easy as initializing the instance with parameter device="gpu". The device parameter accepts any string that follows the SYCL_DEVICE_FILTER filter selector. Examples are:

  • device="cpu" (use CPU)
  • device="gpu" (use default GPU)
  • device="gpu:1" (use second GPU)

Use the engine parameter to choose between the cuML and sklearnex execution engines. The XGBoost, LightGBM and CatBoost models come with their own GPU engine. Setting device="gpu" is sufficient to accelerate them with GPU, regardless of the engine parameter.

Warning

  • GPU accelerated estimators almost never support sparse datasets. Refer to their respective documentation to check which ones do.
  • GPU accelerated estimators often use slightly different hyperparameters than their CPU counterparts.
  • ATOM does not support multi-GPU training. If there is more than one GPU on the machine and the device parameter does not specify which one to use, the first one is used by default.

Example

SageMaker Studio Lab

Train a model on a GPU yourself using SageMaker Studio Lab. Just click on the badge above and run the notebook! Make sure to choose the GPU compute type.

Prerequisites

  • Operating System:
    • Ubuntu 18.04/20.04 or CentOS 7/8 with gcc/++ 9.0+
    • Windows 10+ with WSL2 (see here a tutorial)
  • GPU:
    • For sklearnex: All Intel® integrated and discrete GPUs.
    • For cuML: NVIDIA Pascal™ or better with compute capability 6.0+
  • Drivers:
    • For cuML: CUDA & NVIDIA Drivers of versions 11.0, 11.2, 11.4 or 11.5
    • For sklearnex: Intel® GPU drivers.
  • Libraries:
    • sklearnex>=2021.6.3 (automatically installed with ATOM)
    • cuML>=22.10

Supported estimators

Transformers

Models