Example: Advanced plotting¶
This example shows how to make the best use of all of atom's plotting options.
The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow
.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier
In [2]:
Copied!
# Load data
X = pd.read_csv("docs_source/examples/datasets/weatherAUS.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/weatherAUS.csv")
# Let's have a look
X.head()
Out[2]:
Location | MinTemp | MaxTemp | Rainfall | Evaporation | Sunshine | WindGustDir | WindGustSpeed | WindDir9am | WindDir3pm | ... | Humidity9am | Humidity3pm | Pressure9am | Pressure3pm | Cloud9am | Cloud3pm | Temp9am | Temp3pm | RainToday | RainTomorrow | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MelbourneAirport | 18.0 | 26.9 | 21.4 | 7.0 | 8.9 | SSE | 41.0 | W | SSE | ... | 95.0 | 54.0 | 1019.5 | 1017.0 | 8.0 | 5.0 | 18.5 | 26.0 | Yes | 0 |
1 | Adelaide | 17.2 | 23.4 | 0.0 | NaN | NaN | S | 41.0 | S | WSW | ... | 59.0 | 36.0 | 1015.7 | 1015.7 | NaN | NaN | 17.7 | 21.9 | No | 0 |
2 | Cairns | 18.6 | 24.6 | 7.4 | 3.0 | 6.1 | SSE | 54.0 | SSE | SE | ... | 78.0 | 57.0 | 1018.7 | 1016.6 | 3.0 | 3.0 | 20.8 | 24.1 | Yes | 0 |
3 | Portland | 13.6 | 16.8 | 4.2 | 1.2 | 0.0 | ESE | 39.0 | ESE | ESE | ... | 76.0 | 74.0 | 1021.4 | 1020.5 | 7.0 | 8.0 | 15.6 | 16.0 | Yes | 1 |
4 | Walpole | 16.4 | 19.9 | 0.0 | NaN | NaN | SE | 44.0 | SE | SE | ... | 78.0 | 70.0 | 1019.4 | 1018.9 | NaN | NaN | 17.4 | 18.1 | No | 0 |
5 rows × 22 columns
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Binary classification. Dataset stats ==================== >> Shape: (142193, 22) Train set size: 113755 Test set size: 28438 ------------------------------------- Memory: 25.03 MB Scaled: False Missing values: 316559 (10.1%) Categorical features: 5 (23.8%) Duplicates: 45 (0.0%) Fitting Imputer... Imputing missing values... Fitting Encoder... Encoding categorical columns...
Customize colors and font size¶
In [4]:
Copied!
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [5]:
Copied!
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
In [6]:
Copied!
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [7]:
Copied!
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
Customize the plot's layout¶
In [8]:
Copied!
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
Customize the plot's traces¶
In [9]:
Copied!
# Use the update_traces method to change the trace (note the y-axis)
atom.update_traces(histnorm="percent", selector=dict(type="histogram"))
atom.plot_distribution(columns=[1, 2], distributions=None, title="Distribution of temperatures")
# Use the update_traces method to change the trace (note the y-axis)
atom.update_traces(histnorm="percent", selector=dict(type="histogram"))
atom.plot_distribution(columns=[1, 2], distributions=None, title="Distribution of temperatures")
Customize the title and legend¶
In [10]:
Copied!
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [11]:
Copied!
# And update the title with some custom fonts
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
)
)
# And update the title with some custom fonts
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
)
)
In [12]:
Copied!
# We can update the legend in a similar fashion
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
),
legend=dict(title="Legend's title"),
)
# We can update the legend in a similar fashion
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
),
legend=dict(title="Legend's title"),
)
Customizing the rows to plot¶
In [13]:
Copied!
atom.run("LR")
# You can plot the ROC curve for a selection of rows,
# for example, for rows in a specific location
atom.plot_roc(
rows={
"Portland": atom.test.loc[atom.og.X.Location == "Portland"],
"Sydney": atom.test.loc[atom.og.X.Location == "Sydney"],
}
)
atom.run("LR")
# You can plot the ROC curve for a selection of rows,
# for example, for rows in a specific location
atom.plot_roc(
rows={
"Portland": atom.test.loc[atom.og.X.Location == "Portland"],
"Sydney": atom.test.loc[atom.og.X.Location == "Sydney"],
}
)
Training ========================= >> Models: LR Metric: f1 Results for LogisticRegression: Fit --------------------------------------------- Train evaluation --> f1: 0.5854 Test evaluation --> f1: 0.5805 Time elapsed: 1.303s ------------------------------------------------- Time: 1.303s Final results ==================== >> Total time: 1.339s ------------------------------------- LogisticRegression --> f1: 0.5805
Using a canvas¶
In [14]:
Copied!
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
atom.plot_distribution(columns=1)
atom.plot_distribution(columns=2)
atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
atom.plot_qq(columns=[1, 2])
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
atom.plot_distribution(columns=1)
atom.plot_distribution(columns=2)
atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
atom.plot_qq(columns=[1, 2])