Example: Data engines¶
This example shows how ATOM interacts with other data engines than pandas, for example polars.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
import polars as pl
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
import polars as pl
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data and convert to polars for demonstration purposes
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X = pl.from_pandas(X)
y = pl.from_pandas(y)
X.head()
# Load the data and convert to polars for demonstration purposes
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X = pl.from_pandas(X)
y = pl.from_pandas(y)
X.head()
Out[2]:
shape: (5, 30)
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | radius error | texture error | perimeter error | area error | smoothness error | compactness error | concavity error | concave points error | symmetry error | fractal dimension error | worst radius | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
17.99 | 10.38 | 122.8 | 1001.0 | 0.1184 | 0.2776 | 0.3001 | 0.1471 | 0.2419 | 0.07871 | 1.095 | 0.9053 | 8.589 | 153.4 | 0.006399 | 0.04904 | 0.05373 | 0.01587 | 0.03003 | 0.006193 | 25.38 | 17.33 | 184.6 | 2019.0 | 0.1622 | 0.6656 | 0.7119 | 0.2654 | 0.4601 | 0.1189 |
20.57 | 17.77 | 132.9 | 1326.0 | 0.08474 | 0.07864 | 0.0869 | 0.07017 | 0.1812 | 0.05667 | 0.5435 | 0.7339 | 3.398 | 74.08 | 0.005225 | 0.01308 | 0.0186 | 0.0134 | 0.01389 | 0.003532 | 24.99 | 23.41 | 158.8 | 1956.0 | 0.1238 | 0.1866 | 0.2416 | 0.186 | 0.275 | 0.08902 |
19.69 | 21.25 | 130.0 | 1203.0 | 0.1096 | 0.1599 | 0.1974 | 0.1279 | 0.2069 | 0.05999 | 0.7456 | 0.7869 | 4.585 | 94.03 | 0.00615 | 0.04006 | 0.03832 | 0.02058 | 0.0225 | 0.004571 | 23.57 | 25.53 | 152.5 | 1709.0 | 0.1444 | 0.4245 | 0.4504 | 0.243 | 0.3613 | 0.08758 |
11.42 | 20.38 | 77.58 | 386.1 | 0.1425 | 0.2839 | 0.2414 | 0.1052 | 0.2597 | 0.09744 | 0.4956 | 1.156 | 3.445 | 27.23 | 0.00911 | 0.07458 | 0.05661 | 0.01867 | 0.05963 | 0.009208 | 14.91 | 26.5 | 98.87 | 567.7 | 0.2098 | 0.8663 | 0.6869 | 0.2575 | 0.6638 | 0.173 |
20.29 | 14.34 | 135.1 | 1297.0 | 0.1003 | 0.1328 | 0.198 | 0.1043 | 0.1809 | 0.05883 | 0.7572 | 0.7813 | 5.438 | 94.44 | 0.01149 | 0.02461 | 0.05688 | 0.01885 | 0.01756 | 0.005115 | 22.54 | 16.67 | 152.2 | 1575.0 | 0.1374 | 0.205 | 0.4 | 0.1625 | 0.2364 | 0.07678 |
Run the pipeline¶
In [3]:
Copied!
# Specify the data engine in the constructor
# Note that atom accepts any dataframe-like object to create the dataset
atom = ATOMClassifier(X, y, engine="polars", verbose=2, random_state=1)
# Specify the data engine in the constructor
# Note that atom accepts any dataframe-like object to create the dataset
atom = ATOMClassifier(X, y, engine="polars", verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Binary classification. Data engine: polars Dataset stats ==================== >> Shape: (569, 31) Train set size: 456 Test set size: 113 ------------------------------------- Memory: 138.97 kB Scaled: False Outlier values: 167 (1.2%)
In [4]:
Copied!
# The data attributes return now polars types
atom.X.head(5)
# The data attributes return now polars types
atom.X.head(5)
Out[4]:
shape: (5, 30)
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | radius error | texture error | perimeter error | area error | smoothness error | compactness error | concavity error | concave points error | symmetry error | fractal dimension error | worst radius | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
13.48 | 20.82 | 88.4 | 559.2 | 0.1016 | 0.1255 | 0.1063 | 0.05439 | 0.172 | 0.06419 | 0.213 | 0.5914 | 1.545 | 18.52 | 0.005367 | 0.02239 | 0.03049 | 0.01262 | 0.01377 | 0.003187 | 15.53 | 26.02 | 107.3 | 740.4 | 0.161 | 0.4225 | 0.503 | 0.2258 | 0.2807 | 0.1071 |
18.31 | 20.58 | 120.8 | 1052.0 | 0.1068 | 0.1248 | 0.1569 | 0.09451 | 0.186 | 0.05941 | 0.5449 | 0.9225 | 3.218 | 67.36 | 0.006176 | 0.01877 | 0.02913 | 0.01046 | 0.01559 | 0.002725 | 21.86 | 26.2 | 142.2 | 1493.0 | 0.1492 | 0.2536 | 0.3759 | 0.151 | 0.3074 | 0.07863 |
17.93 | 24.48 | 115.2 | 998.9 | 0.08855 | 0.07027 | 0.05699 | 0.04744 | 0.1538 | 0.0551 | 0.4212 | 1.433 | 2.765 | 45.81 | 0.005444 | 0.01169 | 0.01622 | 0.008522 | 0.01419 | 0.002751 | 20.92 | 34.69 | 135.1 | 1320.0 | 0.1315 | 0.1806 | 0.208 | 0.1136 | 0.2504 | 0.07948 |
15.13 | 29.81 | 96.71 | 719.5 | 0.0832 | 0.04605 | 0.04686 | 0.02739 | 0.1852 | 0.05294 | 0.4681 | 1.627 | 3.043 | 45.38 | 0.006831 | 0.01427 | 0.02489 | 0.009087 | 0.03151 | 0.00175 | 17.26 | 36.91 | 110.1 | 931.4 | 0.1148 | 0.09866 | 0.1547 | 0.06575 | 0.3233 | 0.06165 |
8.95 | 15.76 | 58.74 | 245.2 | 0.09462 | 0.1243 | 0.09263 | 0.02308 | 0.1305 | 0.07163 | 0.3132 | 0.9789 | 3.28 | 16.94 | 0.01835 | 0.0676 | 0.09263 | 0.02308 | 0.02384 | 0.005601 | 9.414 | 17.07 | 63.34 | 270.0 | 0.1179 | 0.1879 | 0.1544 | 0.03846 | 0.1652 | 0.07722 |
In [5]:
Copied!
atom.y.head(5)
atom.y.head(5)
Out[5]:
shape: (5,)
target |
---|
i32 |
0 |
0 |
0 |
0 |
1 |
In [6]:
Copied!
atom.run("LR")
atom.run("LR")
Training ========================= >> Models: LR Metric: f1 Results for LogisticRegression: Fit --------------------------------------------- Train evaluation --> f1: 0.9913 Test evaluation --> f1: 0.9861 Time elapsed: 0.129s ------------------------------------------------- Time: 0.129s Final results ==================== >> Total time: 0.132s ------------------------------------- LogisticRegression --> f1: 0.9861
Analyze the results¶
In [7]:
Copied!
# The prediction methods also return types of the requested data engine
atom.lr.predict(X)
# The prediction methods also return types of the requested data engine
atom.lr.predict(X)
Out[7]:
shape: (569,)
target |
---|
i64 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
… |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
In [8]:
Copied!
atom.lr.engine = "pandas-pyarrow"
atom.lr.predict(X.head(5))
atom.lr.engine = "pandas-pyarrow"
atom.lr.predict(X.head(5))
Out[8]:
0 0 1 0 2 0 3 0 4 0 Name: target, dtype: int64[pyarrow]
In [9]:
Copied!
atom.lr.engine = "dask"
atom.lr.predict(X.head(5))
atom.lr.engine = "dask"
atom.lr.predict(X.head(5))
Out[9]:
Dask Series Structure: npartitions=1 0 int64 4 ... Name: target, dtype: int64 Dask Name: from_pandas, 1 graph layer
In [10]:
Copied!
atom.lr.engine = "pyarrow"
atom.lr.predict(X.head(5))
atom.lr.engine = "pyarrow"
atom.lr.predict(X.head(5))
Out[10]:
<pyarrow.lib.Int64Array object at 0x0000016E06BCD1E0> [ 0, 0, 0, 0, 0 ]