# Example: Getting started
--------------------------

This example shows how to get started with the atom-ml library.

The data used is a variation on the [Australian weather dataset](https://www.kaggle.com/jsphyg/weather-dataset-rattle-package) from Kaggle. You can download it from [here](https://github.com/tvdboom/ATOM/blob/master/examples/datasets/weatherAUS.csv). The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target `RainTomorrow`.

In [6]:
import pandas as pd
from atom import ATOMClassifier

# Load the Australian Weather dataset
X = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")

In [7]:
atom = ATOMClassifier(X, y="RainTomorrow", n_rows=1000, verbose=2)


Algorithm task: Binary classification.

Shape: (1000, 22)
Train set size: 800
Test set size: 200
-------------------------------------
Memory: 176.13 kB
Scaled: False
Missing values: 2243 (10.2%)
Categorical features: 5 (23.8%)



In [8]:
atom.impute(strat_num="median", strat_cat="most_frequent")  
atom.encode(strategy="Target", max_onehot=8)

Fitting Imputer...
Imputing missing values...
 --> Imputing 5 missing values with median (11.8) in column MinTemp.
 --> Imputing 1 missing values with median (22.2) in column MaxTemp.
 --> Imputing 11 missing values with median (0.0) in column Rainfall.
 --> Imputing 429 missing values with median (4.8) in column Evaporation.
 --> Imputing 477 missing values with median (8.6) in column Sunshine.
 --> Imputing 67 missing values with most_frequent (SE) in column WindGustDir.
 --> Imputing 66 missing values with median (39.0) in column WindGustSpeed.
 --> Imputing 78 missing values with most_frequent (N) in column WindDir9am.
 --> Imputing 24 missing values with most_frequent (SSE) in column WindDir3pm.
 --> Imputing 7 missing values with median (13.0) in column WindSpeed9am.
 --> Imputing 20 missing values with median (19.0) in column WindSpeed3pm.
 --> Imputing 15 missing values with median (70.0) in column Humidity9am.
 --> Imputing 24 missing values with median (53.0) in column Humidi

In [9]:
atom.run(models=["LDA", "AdaB"], metric="auc", n_trials=10)


Models: LDA, AdaB
Metric: auc


Running hyperparameter tuning for LinearDiscriminantAnalysis...
| trial |  solver | shrinkage |     auc | best_auc | time_trial | time_ht |    state |
| ----- | ------- | --------- | ------- | -------- | ---------- | ------- | -------- |
| 0     |    lsqr |      auto |  0.6291 |   0.6291 |     0.127s |  0.127s | COMPLETE |
| 1     |     svd |      None |  0.7018 |   0.7018 |     0.122s |  0.250s | COMPLETE |
| 2     |     svd |      None |  0.7018 |   0.7018 |     0.001s |  0.251s | COMPLETE |
| 3     |     svd |      None |  0.7018 |   0.7018 |     0.000s |  0.251s | COMPLETE |
| 4     |     svd |      None |  0.7018 |   0.7018 |     0.000s |  0.251s | COMPLETE |
| 5     |   eigen |      auto |  0.6675 |   0.7018 |     0.129s |  0.380s | COMPLETE |
| 6     |    lsqr |       0.9 |  0.7511 |   0.7511 |     0.124s |  0.504s | COMPLETE |
| 7     |     svd |      None |  0.7018 |   0.7511 |     0.000s |  0.504s | COMPLETE |
| 8     |    lsqr |       0.8 |  

In [10]:
atom.evaluate()

Unnamed: 0,accuracy,ap,ba,f1,jaccard,mcc,precision,recall,auc
LDA,0.83,0.7782,0.8047,0.6852,0.5211,0.5747,0.6271,0.7551,0.8655
AdaB,0.825,0.6712,0.7118,0.5783,0.4068,0.485,0.7059,0.4898,0.8025
