Example: In-training validation¶

This example shows how to keep track of the model's performance during training.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

UserWarning: The pandas version installed (1.5.3) does not match the supported pandas version in Modin (1.5.2). This may cause undesired side effects!

In [2]:

                
                    Copied!
                    
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

                
                    Copied!
                    
# Initialize atom
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
# Initialize atom
atom = ATOMClassifier(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 141.24 kB
Scaled: False
Outlier values: 167 (1.2%)

In [4]:

                
                    Copied!
                    
# Not all models support in-training validation
# You can chek which ones do using the available_models method
df = atom.available_models()[["acronym", "model", "has_validation"]]
df[df["has_validation"]]
# Not all models support in-training validation
# You can chek which ones do using the available_models method
df = atom.available_models()[["acronym", "model", "has_validation"]]
df[df["has_validation"]]

Out[4]:

	acronym	model	has_validation
3	CatB	CatBoost	True
15	LGB	LightGBM	True
19	MLP	MultiLayerPerceptron	True
21	PA	PassiveAggressive	True
22	Perc	Perceptron	True
27	SGD	StochasticGradientDescent	True
29	XGB	XGBoost	True

In [5]:

                
                    Copied!
                    
# Run the models normally
atom.run(models=["MLP", "LGB"], metric="auc")
# Run the models normally
atom.run(models=["MLP", "LGB"], metric="auc")

Training ========================= >>
Models: MLP, LGB
Metric: roc_auc


Results for MultiLayerPerceptron:
Fit ---------------------------------------------
Train evaluation --> roc_auc: 0.9997
Test evaluation --> roc_auc: 0.9936
Time elapsed: 1.602s
-------------------------------------------------
Total time: 1.602s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> roc_auc: 1.0
Test evaluation --> roc_auc: 0.9775
Time elapsed: 0.307s
-------------------------------------------------
Total time: 0.307s


Final results ==================== >>
Total time: 1.913s
-------------------------------------
MultiLayerPerceptron --> roc_auc: 0.9936 !
LightGBM             --> roc_auc: 0.9775

Analyze the results¶

In [6]:

                
                    Copied!
                    
atom.plot_evals(title="In-training validation scores")
atom.plot_evals(title="In-training validation scores")

In [7]:

                
                    Copied!
                    
# Plot the validation on the train and test set
atom.lgb.plot_evals(dataset="train+test", title="LightGBM's in-training validation")
# Plot the validation on the train and test set
atom.lgb.plot_evals(dataset="train+test", title="LightGBM's in-training validation")