Example: Ray backend¶
This example shows how to use the ray backend to train models in a parallel context.
The data used is a synthetic dataset created using sklearn's make_classification function.
Load the data¶
In [1]:
Copied!
# Import packages
import ray
from atom import ATOMClassifier
from sklearn.datasets import make_classification
# Import packages
import ray
from atom import ATOMClassifier
from sklearn.datasets import make_classification
In [2]:
Copied!
# Use a small dataset for illustration purposes
X, y = make_classification(n_samples=10000, n_features=10, random_state=1)
# Use a small dataset for illustration purposes
X, y = make_classification(n_samples=10000, n_features=10, random_state=1)
Run the pipeline¶
In [3]:
Copied!
# Note we already specify the number of cores for parallel execution here
atom = ATOMClassifier(X, y, n_jobs=2, backend="ray", verbose=2, random_state=1)
# Note we already specify the number of cores for parallel execution here
atom = ATOMClassifier(X, y, n_jobs=2, backend="ray", verbose=2, random_state=1)
2024-03-04 10:03:31,922 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Binary classification. Parallel processing with 2 cores. Parallelization backend: ray Dataset stats ==================== >> Shape: (10000, 11) Train set size: 8000 Test set size: 2000 ------------------------------------- Memory: 880.13 kB Scaled: True Outlier values: 211 (0.2%)
In [4]:
Copied!
# Using `parallel=True`, we train one model in each node
# Note that when training in parallel, the verbosity of the models is zero
atom.run(models=["PA", "SGD"], est_params={"max_iter": 150}, parallel=True, errors="raise")
# Using `parallel=True`, we train one model in each node
# Note that when training in parallel, the verbosity of the models is zero
atom.run(models=["PA", "SGD"], est_params={"max_iter": 150}, parallel=True, errors="raise")
Training ========================= >> Models: PA, SGD Metric: f1 Final results ==================== >> Total time: 13.971s ------------------------------------- PassiveAggressive --> f1: 0.8198 StochasticGradientDescent --> f1: 0.8766 !
Analyze the results¶
In [5]:
Copied!
# Notice how the summed time to train the models is less than the total time
atom.plot_results(metric="time_fit")
# Notice how the summed time to train the models is less than the total time
atom.plot_results(metric="time_fit")
In [6]:
Copied!
# Create a rest API endpoint and do inference on the holdout set
atom.pa.serve(port=8001)
# Create a rest API endpoint and do inference on the holdout set
atom.pa.serve(port=8001)
Serving model PassiveAggressive on 127.0.0.1:8001...
In [7]:
Copied!
import requests
X_predict = atom.X_test.iloc[:10, :]
response = requests.get("http://127.0.0.1:8001/", json=X_predict.to_json())
import requests
X_predict = atom.X_test.iloc[:10, :]
response = requests.get("http://127.0.0.1:8001/", json=X_predict.to_json())
In [8]:
Copied!
response.json()
response.json()
Out[8]:
[1, 1, 0, 0, 1, 1, 0, 1, 0, 0]
In [9]:
Copied!
# Don't forget to shut down the ray server
ray.shutdown()
# Don't forget to shut down the ray server
ray.shutdown()