Example: In-training validation¶

This example shows how to keep track of the model's performance during training.

Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.

Load the data¶

In [1]:

Copied!

# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from atom import ATOMClassifier

In [2]:

Copied!

# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)

Run the pipeline¶

In [3]:

Copied!

# Initialize atom
atom = ATOMClassifier(X, y, verbose=2, random_state=1)
# Initialize atom
atom = ATOMClassifier(X, y, verbose=2, random_state=1)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 141.24 kB
Scaled: False
Outlier values: 167 (1.2%)

In [4]:

Copied!

# Not all models support in-training validation
# You can chek which ones do using the available_models method
atom.available_models(validation=True)
# Not all models support in-training validation
# You can chek which ones do using the available_models method
atom.available_models(validation=True)

Out[4]:

	acronym	fullname	estimator	module	handles_missing	needs_scaling	accepts_sparse	native_multilabel	native_multioutput	validation	supports_engines
0	CatB	CatBoost	CatBoostClassifier	catboost.core	True	True	True	False	False	n_estimators	catboost
1	LGB	LightGBM	LGBMClassifier	lightgbm.sklearn	True	True	True	False	False	n_estimators	lightgbm
2	MLP	MultiLayerPerceptron	MLPClassifier	sklearn.neural_network._multilayer_perceptron	False	True	True	True	False	max_iter	sklearn
3	PA	PassiveAggressive	PassiveAggressiveClassifier	sklearn.linear_model._passive_aggressive	False	True	True	False	False	max_iter	sklearn
4	Perc	Perceptron	Perceptron	sklearn.linear_model._perceptron	False	True	False	False	False	max_iter	sklearn
5	SGD	StochasticGradientDescent	SGDClassifier	sklearn.linear_model._stochastic_gradient	False	True	True	False	False	max_iter	sklearn
6	XGB	XGBoost	XGBClassifier	xgboost.sklearn	True	True	True	False	False	n_estimators	xgboost

In [5]:

Copied!

# Run the models normally
atom.run(models=["MLP", "LGB"], metric="auc")
# Run the models normally
atom.run(models=["MLP", "LGB"], metric="auc")

Training ========================= >>
Models: MLP, LGB
Metric: auc


Results for MultiLayerPerceptron:
Fit ---------------------------------------------
Train evaluation --> auc: 0.9997
Test evaluation --> auc: 0.9936
Time elapsed: 1.825s
-------------------------------------------------
Time: 1.825s


Results for LightGBM:
Fit ---------------------------------------------
Train evaluation --> auc: 1.0
Test evaluation --> auc: 0.9775
Time elapsed: 0.417s
-------------------------------------------------
Time: 0.417s


Final results ==================== >>
Total time: 2.246s
-------------------------------------
MultiLayerPerceptron --> auc: 0.9936 !
LightGBM             --> auc: 0.9775

Analyze the results¶

In [6]:

Copied!

atom.plot_evals(title="In-training validation scores")
atom.plot_evals(title="In-training validation scores")

In [7]:

Copied!

# Plot the validation on the train and test set
atom.lgb.plot_evals(dataset="train+test", title="LightGBM's in-training validation")
# Plot the validation on the train and test set
atom.lgb.plot_evals(dataset="train+test", title="LightGBM's in-training validation")