Example: Automated feature scaling¶
This example shows how ATOM handles models that require automated feature scaling.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Binary classification. Dataset stats ==================== >> Shape: (569, 31) Train set size: 456 Test set size: 113 ------------------------------------- Memory: 141.24 kB Scaled: False Outlier values: 167 (1.2%)
In [4]:
Copied!
# Check which models require feature scaling
atom.available_models(needs_scaling=True)
# Check which models require feature scaling
atom.available_models(needs_scaling=True)
Out[4]:
acronym | fullname | estimator | module | handles_missing | needs_scaling | accepts_sparse | native_multilabel | native_multioutput | validation | supports_engines | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | CatB | CatBoost | CatBoostClassifier | catboost.core | True | True | True | False | False | n_estimators | catboost |
1 | KNN | KNearestNeighbors | KNeighborsClassifier | sklearn.neighbors._classification | False | True | True | True | True | None | sklearn, sklearnex, cuml |
2 | LGB | LightGBM | LGBMClassifier | lightgbm.sklearn | True | True | True | False | False | n_estimators | lightgbm |
3 | lSVM | LinearSVM | LinearSVC | sklearn.svm._classes | False | True | True | False | False | None | sklearn, cuml |
4 | LR | LogisticRegression | LogisticRegression | sklearn.linear_model._logistic | False | True | True | False | False | None | sklearn, sklearnex, cuml |
5 | MLP | MultiLayerPerceptron | MLPClassifier | sklearn.neural_network._multilayer_perceptron | False | True | True | True | False | max_iter | sklearn |
6 | PA | PassiveAggressive | PassiveAggressiveClassifier | sklearn.linear_model._passive_aggressive | False | True | True | False | False | max_iter | sklearn |
7 | Perc | Perceptron | Perceptron | sklearn.linear_model._perceptron | False | True | False | False | False | max_iter | sklearn |
8 | RNN | RadiusNearestNeighbors | RadiusNeighborsClassifier | sklearn.neighbors._classification | False | True | True | True | True | None | sklearn |
9 | Ridge | Ridge | RidgeClassifier | sklearn.linear_model._ridge | False | True | True | True | False | None | sklearn, sklearnex, cuml |
10 | SGD | StochasticGradientDescent | SGDClassifier | sklearn.linear_model._stochastic_gradient | False | True | True | False | False | max_iter | sklearn |
11 | SVM | SupportVectorMachine | SVC | sklearn.svm._classes | False | True | True | False | False | None | sklearn, sklearnex, cuml |
12 | XGB | XGBoost | XGBClassifier | xgboost.sklearn | True | True | True | False | False | n_estimators | xgboost |
In [5]:
Copied!
# We fit two models: LR needs scaling and Bag doesn't
atom.run(["LR", "Bag"])
# We fit two models: LR needs scaling and Bag doesn't
atom.run(["LR", "Bag"])
Training ========================= >> Models: LR, Bag Metric: f1 Results for LogisticRegression: Fit --------------------------------------------- Train evaluation --> f1: 0.9913 Test evaluation --> f1: 0.9861 Time elapsed: 0.253s ------------------------------------------------- Time: 0.253s Results for Bagging: Fit --------------------------------------------- Train evaluation --> f1: 0.9982 Test evaluation --> f1: 0.9444 Time elapsed: 0.085s ------------------------------------------------- Time: 0.085s Final results ==================== >> Total time: 0.344s ------------------------------------- LogisticRegression --> f1: 0.9861 ! Bagging --> f1: 0.9444
In [6]:
Copied!
# Now, we create a new branch and scale the features before fitting the model
atom.branch = "scaling"
# Now, we create a new branch and scale the features before fitting the model
atom.branch = "scaling"
Successfully created new branch: scaling.
In [7]:
Copied!
atom.scale()
atom.scale()
Fitting Scaler... Scaling features...
In [8]:
Copied!
atom.run("LR_2")
atom.run("LR_2")
Training ========================= >> Models: LR_2 Metric: f1 Results for LogisticRegression: Fit --------------------------------------------- Train evaluation --> f1: 0.9913 Test evaluation --> f1: 0.9861 Time elapsed: 0.038s ------------------------------------------------- Time: 0.038s Final results ==================== >> Total time: 0.042s ------------------------------------- LogisticRegression --> f1: 0.9861
Analyze the results¶
In [9]:
Copied!
# Let's compare the differences between the models
print(atom.lr.scaler)
print(atom.bag.scaler)
print(atom.lr_2.scaler)
# Let's compare the differences between the models
print(atom.lr.scaler)
print(atom.bag.scaler)
print(atom.lr_2.scaler)
Scaler() None None
In [10]:
Copied!
# And the data they use is different
print(atom.lr.X.iloc[:5, :3])
print("-----------------------------")
print(atom.bag.X.iloc[:5, :3])
print("-----------------------------")
print(atom.lr_2.X_train.equals(atom.lr.X_train))
# And the data they use is different
print(atom.lr.X.iloc[:5, :3])
print("-----------------------------")
print(atom.bag.X.iloc[:5, :3])
print("-----------------------------")
print(atom.lr_2.X_train.equals(atom.lr.X_train))
x0 x1 x2 0 -0.181875 0.356669 -0.147122 1 1.162216 0.300578 1.159704 2 1.056470 1.212060 0.933833 3 0.277287 2.457753 0.188054 4 -1.442482 -0.825921 -1.343434 ----------------------------- x0 x1 x2 0 13.48 20.82 88.40 1 18.31 20.58 120.80 2 17.93 24.48 115.20 3 15.13 29.81 96.71 4 8.95 15.76 58.74 ----------------------------- True
In [11]:
Copied!
# Note that the scaler is included in the model's pipeline
print(atom.lr.pipeline)
print("-----------------------------")
print(atom.bag.pipeline)
print("-----------------------------")
print(atom.lr_2.pipeline)
# Note that the scaler is included in the model's pipeline
print(atom.lr.pipeline)
print("-----------------------------")
print(atom.bag.pipeline)
print("-----------------------------")
print(atom.lr_2.pipeline)
Pipeline(memory=Memory(location=None), ('steps', [('AutomatedScaler', Scaler())]), verbose=False) ----------------------------- Pipeline(memory=Memory(location=None), steps=[], verbose=False) ----------------------------- Pipeline(memory=Memory(location=None), ('steps', [('scaler', Scaler(verbose=2))]), verbose=False)
In [12]:
Copied!
atom.plot_pipeline()
atom.plot_pipeline()