Example: Advanced plotting¶
This example shows how to make the best use of all of atom's plotting options.
The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow
.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier
In [2]:
Copied!
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")
# Let's have a look
X.head()
Out[2]:
Location | MinTemp | MaxTemp | Rainfall | Evaporation | Sunshine | WindGustDir | WindGustSpeed | WindDir9am | WindDir3pm | ... | Humidity9am | Humidity3pm | Pressure9am | Pressure3pm | Cloud9am | Cloud3pm | Temp9am | Temp3pm | RainToday | RainTomorrow | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MelbourneAirport | 18.0 | 26.9 | 21.4 | 7.0 | 8.9 | SSE | 41.0 | W | SSE | ... | 95.0 | 54.0 | 1019.5 | 1017.0 | 8.0 | 5.0 | 18.5 | 26.0 | Yes | 0 |
1 | Adelaide | 17.2 | 23.4 | 0.0 | NaN | NaN | S | 41.0 | S | WSW | ... | 59.0 | 36.0 | 1015.7 | 1015.7 | NaN | NaN | 17.7 | 21.9 | No | 0 |
2 | Cairns | 18.6 | 24.6 | 7.4 | 3.0 | 6.1 | SSE | 54.0 | SSE | SE | ... | 78.0 | 57.0 | 1018.7 | 1016.6 | 3.0 | 3.0 | 20.8 | 24.1 | Yes | 0 |
3 | Portland | 13.6 | 16.8 | 4.2 | 1.2 | 0.0 | ESE | 39.0 | ESE | ESE | ... | 76.0 | 74.0 | 1021.4 | 1020.5 | 7.0 | 8.0 | 15.6 | 16.0 | Yes | 1 |
4 | Walpole | 16.4 | 19.9 | 0.0 | NaN | NaN | SE | 44.0 | SE | SE | ... | 78.0 | 70.0 | 1019.4 | 1018.9 | NaN | NaN | 17.4 | 18.1 | No | 0 |
5 rows × 22 columns
Run the pipeline¶
In [3]:
Copied!
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
<< ================== ATOM ================== >> Algorithm task: binary classification. Dataset stats ==================== >> Shape: (142193, 22) Memory: 61.69 MB Scaled: False Missing values: 316559 (10.1%) Categorical features: 5 (23.8%) Duplicate samples: 45 (0.0%) ------------------------------------- Train set size: 113755 Test set size: 28438 ------------------------------------- Fitting Imputer... Imputing missing values... Fitting Encoder... Encoding categorical columns...
Customize colors and font size¶
In [4]:
Copied!
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [5]:
Copied!
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
In [6]:
Copied!
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [7]:
Copied!
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
Customize the plot's layout¶
In [8]:
Copied!
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
Customize the title and legend¶
In [9]:
Copied!
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
In [10]:
Copied!
# And update the title with some custom fonts
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
)
)
# And update the title with some custom fonts
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
)
)
In [11]:
Copied!
# We can update the legend in a similar fashion
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
),
legend=dict(title="Legend's title"),
)
# We can update the legend in a similar fashion
atom.plot_distribution(
columns=[1, 2],
title=dict(
text="Distribution of temperatures",
font_color="teal",
x=0,
xanchor="left",
),
legend=dict(title="Legend's title"),
)
Using a canvas¶
In [12]:
Copied!
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
atom.plot_distribution(columns=1)
atom.plot_distribution(columns=2)
atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
atom.plot_qq(columns=[1, 2])
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
atom.plot_distribution(columns=1)
atom.plot_distribution(columns=2)
atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
atom.plot_qq(columns=[1, 2])