Holdout set¶

This example shows when and how to use ATOM's holdout set in an exploration pipeline.

The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow.

Load the data¶

In [1]:

            
                Copied!
                
# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier

In [2]:

            
                Copied!
                
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")

# Let's have a look
X.head()

Out[2]:

	Location	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustDir	WindGustSpeed	WindDir9am	WindDir3pm	...	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm	RainToday	RainTomorrow
0	MelbourneAirport	18.0	26.9	21.4	7.0	8.9	SSE	41.0	W	SSE	...	95.0	54.0	1019.5	1017.0	8.0	5.0	18.5	26.0	Yes	0
1	Adelaide	17.2	23.4	0.0	NaN	NaN	S	41.0	S	WSW	...	59.0	36.0	1015.7	1015.7	NaN	NaN	17.7	21.9	No	0
2	Cairns	18.6	24.6	7.4	3.0	6.1	SSE	54.0	SSE	SE	...	78.0	57.0	1018.7	1016.6	3.0	3.0	20.8	24.1	Yes	0
3	Portland	13.6	16.8	4.2	1.2	0.0	ESE	39.0	ESE	ESE	...	76.0	74.0	1021.4	1020.5	7.0	8.0	15.6	16.0	Yes	1
4	Walpole	16.4	19.9	0.0	NaN	NaN	SE	44.0	SE	SE	...	78.0	70.0	1019.4	1018.9	NaN	NaN	17.4	18.1	No	0

5 rows × 22 columns

Run the pipeline¶

In [3]:

            
                Copied!
                
# Initialize atom specifying a fraction of the dataset for holdout
atom = ATOMClassifier(X, n_rows=0.5, holdout_size=0.2, verbose=2, warnings=False)
# Initialize atom specifying a fraction of the dataset for holdout
atom = ATOMClassifier(X, n_rows=0.5, holdout_size=0.2, verbose=2, warnings=False)

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (56877, 22)
Scaled: False
Missing values: 127169 (10.2%)
Categorical features: 5 (23.8%)
Duplicate samples: 16 (0.0%)
-------------------------------------
Train set size: 42658
Test set size: 14219
Holdout set size: 14219
-------------------------------------
|    |       dataset |         train |          test |
| -- | ------------- | ------------- | ------------- |
| 0  |   44195 (3.5) |   33219 (3.5) |   10976 (3.4) |
| 1  |   12682 (1.0) |    9439 (1.0) |    3243 (1.0) |

In [4]:

            
                Copied!
                
# The test and holdout fractions are split after subsampling the dataset
# Also note that the holdout data set is not a part of atom's dataset
print("Length loaded data:", len(X))
print("Length dataset + holdout:", len(atom.dataset) + len(atom.holdout))
# The test and holdout fractions are split after subsampling the dataset
# Also note that the holdout data set is not a part of atom's dataset
print("Length loaded data:", len(X))
print("Length dataset + holdout:", len(atom.dataset) + len(atom.holdout))

Length loaded data: 142193
Length dataset + holdout: 71096

In [5]:

            
                Copied!
                
atom.impute()
atom.encode()
atom.impute()
atom.encode()

Fitting Imputer...
Imputing missing values...
 --> Dropping 261 samples due to missing values in feature MinTemp.
 --> Dropping 93 samples due to missing values in feature MaxTemp.
 --> Dropping 475 samples due to missing values in feature Rainfall.
 --> Dropping 23842 samples due to missing values in feature Evaporation.
 --> Dropping 4293 samples due to missing values in feature Sunshine.
 --> Dropping 1665 samples due to missing values in feature WindGustDir.
 --> Dropping 906 samples due to missing values in feature WindDir9am.
 --> Dropping 77 samples due to missing values in feature WindDir3pm.
 --> Dropping 104 samples due to missing values in feature Humidity9am.
 --> Dropping 23 samples due to missing values in feature Humidity3pm.
 --> Dropping 19 samples due to missing values in feature Pressure9am.
 --> Dropping 11 samples due to missing values in feature Pressure3pm.
 --> Dropping 2111 samples due to missing values in feature Cloud9am.
 --> Dropping 531 samples due to missing values in feature Cloud3pm.
Fitting Encoder...
Encoding categorical columns...
 --> LeaveOneOut-encoding feature Location. Contains 26 classes.
 --> LeaveOneOut-encoding feature WindGustDir. Contains 16 classes.
 --> LeaveOneOut-encoding feature WindDir9am. Contains 16 classes.
 --> LeaveOneOut-encoding feature WindDir3pm. Contains 16 classes.
 --> Ordinal-encoding feature RainToday. Contains 2 classes.

In [6]:

            
                Copied!
                
# Unlike train and test, the holdout data set is not transformed until used for predictions
atom.holdout
# Unlike train and test, the holdout data set is not transformed until used for predictions
atom.holdout

Out[6]:

	Location	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustDir	WindGustSpeed	WindDir9am	WindDir3pm	...	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm	RainToday	RainTomorrow
0	BadgerysCreek	14.9	29.2	0.0	NaN	NaN	ENE	28.0	SW	N	...	92.0	47.0	1020.1	1016.8	NaN	NaN	19.1	27.9	No	0
1	Richmond	12.8	32.0	0.0	NaN	NaN	W	31.0	NaN	SSW	...	81.0	18.0	1016.8	1014.0	NaN	NaN	17.6	30.8	No	0
2	Wollongong	18.3	24.5	0.0	NaN	NaN	E	31.0	WNW	ESE	...	75.0	74.0	1018.7	1016.2	1.0	8.0	21.0	23.5	No	1
3	Mildura	19.0	38.2	0.0	8.4	7.8	NNW	35.0	NNE	NW	...	39.0	14.0	1018.2	1016.3	2.0	2.0	24.8	37.4	No	0
4	Perth	7.2	18.8	0.0	4.0	3.6	SW	35.0	NNE	SW	...	74.0	74.0	1011.5	1011.5	7.0	7.0	15.2	15.2	No	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
14214	Townsville	16.8	27.7	0.2	6.6	8.4	ENE	41.0	S	ENE	...	82.0	52.0	1019.3	1015.3	6.0	2.0	21.6	26.1	No	0
14215	MountGambier	3.6	13.3	0.8	0.8	4.1	NNW	30.0	N	NW	...	94.0	66.0	1024.0	1024.7	7.0	6.0	7.4	12.5	No	0
14216	Cairns	25.6	32.2	0.0	6.8	10.7	NE	28.0	NNE	NNE	...	71.0	63.0	1013.9	1010.7	4.0	1.0	29.7	31.8	No	0
14217	PearceRAAF	14.8	23.8	0.0	NaN	9.4	SW	54.0	WSW	SW	...	58.0	42.0	1013.2	1015.7	8.0	2.0	20.0	21.2	No	0
14218	Newcastle	16.0	19.0	44.6	NaN	NaN	NaN	NaN	NaN	NaN	...	94.0	97.0	NaN	NaN	8.0	8.0	16.5	16.5	Yes	1

14219 rows × 22 columns

In [7]:

            
                Copied!
                
atom.run(models=["GNB", "LR", "RF"])
atom.run(models=["GNB", "LR", "RF"])

Training ========================= >>
Models: GNB, LR, RF
Metric: f1

Results for Gaussian Naive Bayes:
Fit ---------------------------------------------
Train evaluation --> f1: 0.6079
Test evaluation --> f1: 0.5944
Time elapsed: 0.036s
-------------------------------------------------
Total time: 0.036s


Results for Logistic Regression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.6209
Test evaluation --> f1: 0.6114
Time elapsed: 0.092s
-------------------------------------------------
Total time: 0.092s


Results for Random Forest:
Fit ---------------------------------------------
Train evaluation --> f1: 0.9999
Test evaluation --> f1: 0.6108
Time elapsed: 2.684s
-------------------------------------------------
Total time: 2.684s


Final results ==================== >>
Duration: 2.814s
-------------------------------------
Gaussian Naive Bayes --> f1: 0.5944
Logistic Regression  --> f1: 0.6114 !
Random Forest        --> f1: 0.6108 ~

In [8]:

            
                Copied!
                
atom.plot_prc()
atom.plot_prc()

In [9]:

            
                Copied!
                
# Based on the results on the test set, we select the best model for further tuning
atom.run("lr_tuned", n_calls=10, n_initial_points=5)
# Based on the results on the test set, we select the best model for further tuning
atom.run("lr_tuned", n_calls=10, n_initial_points=5)

Training ========================= >>
Models: LR_tuned
Metric: f1



Running BO for Logistic Regression...
| call             | penalty |       C |  solver | max_iter | l1_ratio |      f1 | best_f1 |    time | total_time |
| ---------------- | ------- | ------- | ------- | -------- | -------- | ------- | ------- | ------- | ---------- |
| Initial point 1  |      l2 |   0.169 |   lbfgs |      678 |      --- |  0.5925 |  0.5925 |  0.258s |     0.359s |
| Initial point 2  |      l2 |   0.051 | newto.. |      695 |      --- |  0.5953 |  0.5953 |  0.277s |     0.819s |
| Initial point 3  |      l2 |  14.103 | libli.. |      103 |      --- |  0.6163 |  0.6163 |  0.250s |     1.132s |
| Initial point 4  |      l2 |  94.959 |   lbfgs |      176 |      --- |  0.6316 |  0.6316 |  0.266s |     1.458s |
| Initial point 5  |      l2 |   0.026 |    saga |      400 |      --- |  0.6209 |  0.6316 |  0.263s |     1.784s |
| Iteration 6      |      l2 |   0.548 | libli.. |      259 |      --- |   0.629 |  0.6316 |  0.247s |     2.322s |
| Iteration 7      |    none |     --- |    saga |     1000 |      --- |   0.628 |  0.6316 |  0.352s |     3.059s |
| Iteration 8      |      l2 |   0.006 | libli.. |      898 |      --- |  0.6258 |  0.6316 |  0.236s |     3.608s |
| Iteration 9      |    none |     --- |   lbfgs |      497 |      --- |  0.6287 |  0.6316 |  0.251s |     4.195s |
| Iteration 10     |      l2 |    0.02 |     sag |      822 |      --- |  0.5898 |  0.6316 |  0.260s |     4.808s |

Results for Logistic Regression:         
Bayesian Optimization ---------------------------
Best call --> Initial point 4
Best parameters --> {'penalty': 'l2', 'C': 94.959, 'solver': 'lbfgs', 'max_iter': 176}
Best evaluation --> f1: 0.6316
Time elapsed: 5.144s
Fit ---------------------------------------------
Train evaluation --> f1: 0.6206
Test evaluation --> f1: 0.6108
Time elapsed: 0.086s
-------------------------------------------------
Total time: 5.231s


Final results ==================== >>
Duration: 5.232s
-------------------------------------
Logistic Regression --> f1: 0.6108

Analyze the results¶

We already used the test set to choose the best model for futher tuning, so this set is no longer truly independent. Although it may not be directly visible in the results, using the test set now to evaluate the tuned LR model would be a mistake, since it carries a bias. For this reason, we have set apart an extra, indepedent set to validate the final model: the holdout set. If we are not going to use the test set for validation, we might as well use it to train the model and so optimize the use of the available data. Use the full_train method for this.

In [10]:

            
                Copied!
                
# Re-train the model on the full dataset (train + test) 
atom.lr_tuned.full_train()
# Re-train the model on the full dataset (train + test) 
atom.lr_tuned.full_train()

Model LR_tuned successfully retrained.

In [11]:

            
                Copied!
                
# Evaluate on the holdout set
atom.lr_tuned.evaluate(dataset="holdout")
# Evaluate on the holdout set
atom.lr_tuned.evaluate(dataset="holdout")

Out[11]:

accuracy             0.851996
average_precision    0.721873
balanced_accuracy    0.734260
f1                   0.610644
jaccard              0.439516
matthews_corrcoef    0.534076
precision            0.734831
recall               0.522364
roc_auc              0.881256
Name: LR_tuned, dtype: float64

In [12]:

            
                Copied!
                
atom.lr_tuned.plot_prc(dataset="holdout")
atom.lr_tuned.plot_prc(dataset="holdout")