Example: Calibration¶
This example shows how to calibrate a classifier through atom.
The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow
.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X = pd.read_csv("./datasets/weatherAUS.csv")
# Let's have a look
X.head()
# Load the data
X = pd.read_csv("./datasets/weatherAUS.csv")
# Let's have a look
X.head()
Out[2]:
Location | MinTemp | MaxTemp | Rainfall | Evaporation | Sunshine | WindGustDir | WindGustSpeed | WindDir9am | WindDir3pm | ... | Humidity9am | Humidity3pm | Pressure9am | Pressure3pm | Cloud9am | Cloud3pm | Temp9am | Temp3pm | RainToday | RainTomorrow | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MelbourneAirport | 18.0 | 26.9 | 21.4 | 7.0 | 8.9 | SSE | 41.0 | W | SSE | ... | 95.0 | 54.0 | 1019.5 | 1017.0 | 8.0 | 5.0 | 18.5 | 26.0 | Yes | 0 |
1 | Adelaide | 17.2 | 23.4 | 0.0 | NaN | NaN | S | 41.0 | S | WSW | ... | 59.0 | 36.0 | 1015.7 | 1015.7 | NaN | NaN | 17.7 | 21.9 | No | 0 |
2 | Cairns | 18.6 | 24.6 | 7.4 | 3.0 | 6.1 | SSE | 54.0 | SSE | SE | ... | 78.0 | 57.0 | 1018.7 | 1016.6 | 3.0 | 3.0 | 20.8 | 24.1 | Yes | 0 |
3 | Portland | 13.6 | 16.8 | 4.2 | 1.2 | 0.0 | ESE | 39.0 | ESE | ESE | ... | 76.0 | 74.0 | 1021.4 | 1020.5 | 7.0 | 8.0 | 15.6 | 16.0 | Yes | 1 |
4 | Walpole | 16.4 | 19.9 | 0.0 | NaN | NaN | SE | 44.0 | SE | SE | ... | 78.0 | 70.0 | 1019.4 | 1018.9 | NaN | NaN | 17.4 | 18.1 | No | 0 |
5 rows × 22 columns
Run the pipeline¶
In [20]:
Copied!
atom = ATOMClassifier(X, "RainTomorrow", n_rows=1e4, verbose=1, warnings=False)
# Apply data cleaning steps
atom.clean()
atom.impute(strat_num="median", strat_cat="most_frequent")
atom.encode(strategy="target", max_onehot=5, infrequent_to_value=0.05)
# Train a linear SVM
atom.run("gnb")
atom = ATOMClassifier(X, "RainTomorrow", n_rows=1e4, verbose=1, warnings=False)
# Apply data cleaning steps
atom.clean()
atom.impute(strat_num="median", strat_cat="most_frequent")
atom.encode(strategy="target", max_onehot=5, infrequent_to_value=0.05)
# Train a linear SVM
atom.run("gnb")
<< ================== ATOM ================== >> Algorithm task: binary classification. Dataset stats ==================== >> Shape: (10000, 22) Train set size: 8000 Test set size: 2000 ------------------------------------- Memory: 4.34 MB Scaled: False Missing values: 22053 (10.0%) Categorical features: 5 (23.8%) Fitting Cleaner... Cleaning the data... Fitting Imputer... Imputing missing values... Fitting Encoder... Encoding categorical columns... Training ========================= >> Models: GNB Metric: f1 Results for GaussianNB: Fit --------------------------------------------- Train evaluation --> f1: 0.5836 Test evaluation --> f1: 0.5804 Time elapsed: 0.138s ------------------------------------------------- Total time: 0.138s Final results ==================== >> Total time: 0.144s ------------------------------------- GaussianNB --> f1: 0.5804
Analyze the results¶
In [21]:
Copied!
# Check the model's calibration
atom.plot_calibration()
# Check the model's calibration
atom.plot_calibration()
In [22]:
Copied!
# Let's try to improve it using the calibrate method
atom.winner.calibrate(method="isotonic", cv=5)
# Let's try to improve it using the calibrate method
atom.winner.calibrate(method="isotonic", cv=5)
Results for GaussianNB: Fit --------------------------------------------- Train evaluation --> f1: 0.4959 Test evaluation --> f1: 0.4922 Time elapsed: 0.532s
In [23]:
Copied!
# And check again...
atom.plot_calibration()
# And check again...
atom.plot_calibration()