Example: Automated feature scaling¶

This example shows how ATOM handles models that require automated feature scaling.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

In [2]:

                
                    Copied!
                    
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

                
                    Copied!
                    
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
atom = ATOMClassifier(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 141.24 kB
Scaled: False
Outlier values: 167 (1.2%)

In [4]:

                
                    Copied!
                    
# Check which models require feature scaling
atom.available_models()[["acronym", "model", "needs_scaling"]]
# Check which models require feature scaling
atom.available_models()[["acronym", "model", "needs_scaling"]]

Out[4]:

	acronym	model	needs_scaling
0	AdaB	AdaBoost	False
1	Bag	Bagging	False
2	BNB	BernoulliNB	False
3	CatB	CatBoost	True
4	CatNB	CategoricalNB	False
5	CNB	ComplementNB	False
6	Tree	DecisionTree	False
7	Dummy	Dummy	False
8	ETree	ExtraTree	False
9	ET	ExtraTrees	False
10	GNB	GaussianNB	False
11	GP	GaussianProcess	False
12	GBM	GradientBoosting	False
13	hGBM	HistGradientBoosting	False
14	KNN	KNearestNeighbors	True
15	LGB	LightGBM	True
16	LDA	LinearDiscriminantAnalysis	False
17	lSVM	LinearSVM	True
18	LR	LogisticRegression	True
19	MLP	MultiLayerPerceptron	True
20	MNB	MultinomialNB	False
21	PA	PassiveAggressive	True
22	Perc	Perceptron	True
23	QDA	QuadraticDiscriminantAnalysis	False
24	RNN	RadiusNearestNeighbors	True
25	RF	RandomForest	False
26	Ridge	Ridge	True
27	SGD	StochasticGradientDescent	True
28	SVM	SupportVectorMachine	True
29	XGB	XGBoost	True

In [5]:

                
                    Copied!
                    
# We fit two models: LR needs scaling and Bag doesn't
atom.run(["LR", "Bag"])
# We fit two models: LR needs scaling and Bag doesn't
atom.run(["LR", "Bag"])

Training ========================= >>
Models: LR, Bag
Metric: f1


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9913
Test evaluation --> f1: 0.9861
Time elapsed: 0.182s
-------------------------------------------------
Total time: 0.182s


Results for Bagging:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9982
Test evaluation --> f1: 0.9444
Time elapsed: 0.378s
-------------------------------------------------
Total time: 0.378s


Final results ==================== >>
Total time: 0.572s
-------------------------------------
LogisticRegression --> f1: 0.9861 !
Bagging            --> f1: 0.9444

In [6]:

                
                    Copied!
                    
# Now, we create a new branch and scale the features before fitting the model
atom.branch = "scaling"
# Now, we create a new branch and scale the features before fitting the model
atom.branch = "scaling"

New branch scaling successfully created.

In [7]:

                
                    Copied!
                    
atom.scale()
atom.scale()

Fitting Scaler...
Scaling features...

In [8]:

                
                    Copied!
                    
atom.run("LR2")
atom.run("LR2")

Training ========================= >>
Models: LR2
Metric: f1


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9913
Test evaluation --> f1: 0.9861
Time elapsed: 0.123s
-------------------------------------------------
Total time: 0.123s


Final results ==================== >>
Total time: 0.133s
-------------------------------------
LogisticRegression --> f1: 0.9861

Analyze the results¶

In [9]:

                
                    Copied!
                    
# Let's compare the differences between the models
print(atom.lr.scaler)
print(atom.bag.scaler)
print(atom.lr2.scaler)
# Let's compare the differences between the models
print(atom.lr.scaler)
print(atom.bag.scaler)
print(atom.lr2.scaler)

Scaler()
None
None

In [10]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# And the data they use is different
print(atom.lr.X.iloc[:5, :3])
print("-----------------------------")
print(atom.bag.X.iloc[:5, :3])
print("-----------------------------")
print(atom.lr2.X_train.equals(atom.lr.X_train))
# And the data they use is different
print(atom.lr.X.iloc[:5, :3])
print("-----------------------------")
print(atom.bag.X.iloc[:5, :3])
print("-----------------------------")
print(atom.lr2.X_train.equals(atom.lr.X_train))

         x0        x1        x2
0 -0.181875  0.356669 -0.147122
1  1.162216  0.300578  1.159704
2  1.056470  1.212060  0.933833
3  0.277287  2.457753  0.188054
4 -1.442482 -0.825921 -1.343434
-----------------------------
      x0     x1      x2
0  13.48  20.82   88.40
1  18.31  20.58  120.80
2  17.93  24.48  115.20
3  15.13  29.81   96.71
4   8.95  15.76   58.74
-----------------------------
True

In [11]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Note that the scaler is included in the model's pipeline
print(atom.lr.pipeline)
print("-----------------------------")
print(atom.bag.pipeline)
print("-----------------------------")
print(atom.lr2.pipeline)
# Note that the scaler is included in the model's pipeline
print(atom.lr.pipeline)
print("-----------------------------")
print(atom.bag.pipeline)
print("-----------------------------")
print(atom.lr2.pipeline)

0    Scaler()
dtype: object
-----------------------------
Series([], Name: master, dtype: object)
-----------------------------
0    Scaler(verbose=2)
dtype: object

In [12]:

                
                    Copied!
                    
atom.plot_pipeline()
atom.plot_pipeline()