Example: Advanced plotting¶

This example shows how to make the best use of all of atom's plotting options.

The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow.

Load the data¶

In [1]:

Copied!

# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier

In [2]:

Copied!

# Load data
X = pd.read_csv("docs_source/examples/datasets/weatherAUS.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/weatherAUS.csv")

# Let's have a look
X.head()

Out[2]:

	Location	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustDir	WindGustSpeed	WindDir9am	WindDir3pm	...	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm	RainToday	RainTomorrow
0	MelbourneAirport	18.0	26.9	21.4	7.0	8.9	SSE	41.0	W	SSE	...	95.0	54.0	1019.5	1017.0	8.0	5.0	18.5	26.0	Yes	0
1	Adelaide	17.2	23.4	0.0	NaN	NaN	S	41.0	S	WSW	...	59.0	36.0	1015.7	1015.7	NaN	NaN	17.7	21.9	No	0
2	Cairns	18.6	24.6	7.4	3.0	6.1	SSE	54.0	SSE	SE	...	78.0	57.0	1018.7	1016.6	3.0	3.0	20.8	24.1	Yes	0
3	Portland	13.6	16.8	4.2	1.2	0.0	ESE	39.0	ESE	ESE	...	76.0	74.0	1021.4	1020.5	7.0	8.0	15.6	16.0	Yes	1
4	Walpole	16.4	19.9	0.0	NaN	NaN	SE	44.0	SE	SE	...	78.0	70.0	1019.4	1018.9	NaN	NaN	17.4	18.1	No	0

5 rows × 22 columns

Run the pipeline¶

In [3]:

Copied!

atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (142193, 22)
Train set size: 113755
Test set size: 28438
-------------------------------------
Memory: 25.03 MB
Scaled: False
Missing values: 316559 (10.1%)
Categorical features: 5 (23.8%)
Duplicates: 45 (0.0%)

Fitting Imputer...
Imputing missing values...
Fitting Encoder...
Encoding categorical columns...

Customize colors and font size¶

In [4]:

Copied!

# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [5]:

Copied!

# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]

In [6]:

Copied!

atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [7]:

Copied!





# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

Customize the plot's layout¶

In [8]:

Copied!

# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

Customize the plot's traces¶

In [9]:

Copied!

# Use the update_traces method to change the trace (note the y-axis)
atom.update_traces(histnorm="percent", selector=dict(type="histogram"))
atom.plot_distribution(columns=[1, 2], distributions=None, title="Distribution of temperatures")
# Use the update_traces method to change the trace (note the y-axis)
atom.update_traces(histnorm="percent", selector=dict(type="histogram"))
atom.plot_distribution(columns=[1, 2], distributions=None, title="Distribution of temperatures")

Customize the title and legend¶

In [10]:

Copied!

# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [11]:

Copied!





# And update the title with some custom fonts
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    )
)
# And update the title with some custom fonts
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    )
)

In [12]:

Copied!





# We can update the legend in a similar fashion
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    ),
    legend=dict(title="Legend's title"),
)
# We can update the legend in a similar fashion
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    ),
    legend=dict(title="Legend's title"),
)

Customizing the rows to plot¶

In [13]:

Copied!





atom.run("LR")

# You can plot the ROC curve for a selection of rows,
# for example, for rows in a specific location
atom.plot_roc(
    rows={
        "Portland": atom.test.loc[atom.og.X.Location == "Portland"],
        "Sydney": atom.test.loc[atom.og.X.Location == "Sydney"],
    }
)
atom.run("LR")

# You can plot the ROC curve for a selection of rows,
# for example, for rows in a specific location
atom.plot_roc(
    rows={
        "Portland": atom.test.loc[atom.og.X.Location == "Portland"],
        "Sydney": atom.test.loc[atom.og.X.Location == "Sydney"],
    }
)

Training ========================= >>
Models: LR
Metric: f1


Results for LogisticRegression:
Fit ---------------------------------------------
Train evaluation --> f1: 0.5854
Test evaluation --> f1: 0.5805
Time elapsed: 1.303s
-------------------------------------------------
Time: 1.303s


Final results ==================== >>
Total time: 1.339s
-------------------------------------
LogisticRegression --> f1: 0.5805

Using a canvas¶

In [14]:

Copied!





# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
    atom.plot_distribution(columns=1)
    atom.plot_distribution(columns=2)
    atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
    atom.plot_qq(columns=[1, 2])
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
    atom.plot_distribution(columns=1)
    atom.plot_distribution(columns=2)
    atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
    atom.plot_qq(columns=[1, 2])