Example: Ray backend¶

This example shows how to use the ray backend to train models in a parallel context.

The data used is a synthetic dataset created using sklearn's make_classification function.

Load the data¶

In [1]:

Copied!





# Import packages
import ray
from atom import ATOMClassifier
from sklearn.datasets import make_classification
# Import packages
import ray
from atom import ATOMClassifier
from sklearn.datasets import make_classification

In [2]:

Copied!

# Use a small dataset for illustration purposes
X, y = make_classification(n_samples=10000, n_features=10, random_state=1)
# Use a small dataset for illustration purposes
X, y = make_classification(n_samples=10000, n_features=10, random_state=1)

Run the pipeline¶

In [3]:

Copied!

# Note we already specify the number of cores for parallel execution here
atom = ATOMClassifier(X, y, n_jobs=2, backend="ray", verbose=2, random_state=1)
# Note we already specify the number of cores for parallel execution here
atom = ATOMClassifier(X, y, n_jobs=2, backend="ray", verbose=2, random_state=1)

2024-03-04 10:03:31,922	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.
Parallel processing with 2 cores.
Parallelization backend: ray

Dataset stats ==================== >>
Shape: (10000, 11)
Train set size: 8000
Test set size: 2000
-------------------------------------
Memory: 880.13 kB
Scaled: True
Outlier values: 211 (0.2%)

In [4]:

Copied!

# Using `parallel=True`, we train one model in each node
# Note that when training in parallel, the verbosity of the models is zero
atom.run(models=["PA", "SGD"], est_params={"max_iter": 150}, parallel=True, errors="raise")
# Using `parallel=True`, we train one model in each node
# Note that when training in parallel, the verbosity of the models is zero
atom.run(models=["PA", "SGD"], est_params={"max_iter": 150}, parallel=True, errors="raise")

Training ========================= >>
Models: PA, SGD
Metric: f1


Final results ==================== >>
Total time: 13.971s
-------------------------------------
PassiveAggressive         --> f1: 0.8198
StochasticGradientDescent --> f1: 0.8766 !

Analyze the results¶

In [5]:

Copied!

# Notice how the summed time to train the models is less than the total time
atom.plot_results(metric="time_fit")
# Notice how the summed time to train the models is less than the total time
atom.plot_results(metric="time_fit")

In [6]:

Copied!

# Create a rest API endpoint and do inference on the holdout set
atom.pa.serve(port=8001)
# Create a rest API endpoint and do inference on the holdout set
atom.pa.serve(port=8001)

Serving model PassiveAggressive on 127.0.0.1:8001...

In [7]:

Copied!

import requests

X_predict = atom.X_test.iloc[:10, :]
response = requests.get("http://127.0.0.1:8001/", json=X_predict.to_json())
import requests

X_predict = atom.X_test.iloc[:10, :]
response = requests.get("http://127.0.0.1:8001/", json=X_predict.to_json())

In [8]:

Copied!

response.json()
response.json()

Out[8]:

[1, 1, 0, 0, 1, 1, 0, 1, 0, 0]

In [9]:

Copied!

# Don't forget to shut down the ray server
ray.shutdown()
# Don't forget to shut down the ray server
ray.shutdown()