Example: Advanced plotting¶

This example shows how to make the best use of all of atom's plotting options.

The data used is a variation on the Australian weather dataset from Kaggle. You can download it from here. The goal of this dataset is to predict whether or not it will rain tomorrow training a binary classifier on target RainTomorrow.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
import pandas as pd
from atom import ATOMClassifier
# Import packages
import pandas as pd
from atom import ATOMClassifier

In [2]:

                
                    Copied!
                    
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")

# Let's have a look
X.head()
# Load data
X = pd.read_csv("./datasets/weatherAUS.csv")

# Let's have a look
X.head()

Out[2]:

	Location	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustDir	WindGustSpeed	WindDir9am	WindDir3pm	...	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm	RainToday	RainTomorrow
0	MelbourneAirport	18.0	26.9	21.4	7.0	8.9	SSE	41.0	W	SSE	...	95.0	54.0	1019.5	1017.0	8.0	5.0	18.5	26.0	Yes	0
1	Adelaide	17.2	23.4	0.0	NaN	NaN	S	41.0	S	WSW	...	59.0	36.0	1015.7	1015.7	NaN	NaN	17.7	21.9	No	0
2	Cairns	18.6	24.6	7.4	3.0	6.1	SSE	54.0	SSE	SE	...	78.0	57.0	1018.7	1016.6	3.0	3.0	20.8	24.1	Yes	0
3	Portland	13.6	16.8	4.2	1.2	0.0	ESE	39.0	ESE	ESE	...	76.0	74.0	1021.4	1020.5	7.0	8.0	15.6	16.0	Yes	1
4	Walpole	16.4	19.9	0.0	NaN	NaN	SE	44.0	SE	SE	...	78.0	70.0	1019.4	1018.9	NaN	NaN	17.4	18.1	No	0

5 rows × 22 columns

Run the pipeline¶

In [3]:

                
                    Copied!
                    
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()
atom = ATOMClassifier(X, y="RainTomorrow", verbose=1)
atom.impute()
atom.encode()

<< ================== ATOM ================== >>
Algorithm task: binary classification.

Dataset stats ==================== >>
Shape: (142193, 22)
Train set size: 113755
Test set size: 28438
-------------------------------------
Memory: 61.69 MB
Scaled: False
Missing values: 316559 (10.1%)
Categorical features: 5 (23.8%)
Duplicate samples: 45 (0.0%)

Fitting Imputer...
Imputing missing values...
Fitting Encoder...
Encoding categorical columns...

Customize colors and font size¶

In [4]:

                
                    Copied!
                    
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's see how the default aesthetics looks like
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [5]:

                
                    Copied!
                    
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]
# Change the color palette using color names or their hex codes
atom.palette = ["red", "#00f"]

In [6]:

                
                    Copied!
                    
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [7]:

                
                    Copied!
                    
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Change the title and label fontsize
atom.title_fontsize = 30
atom.label_fontsize = 24
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

Customize the plot's layout¶

In [8]:

                
                    Copied!
                    
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Use the update_layout method to change layout properties
atom.update_layout(template="simple_white", barmode="group", hovermode="x")
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

Customize the title and legend¶

In [9]:

                
                    Copied!
                    
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")
# Let's go back to the default aesthetics
atom.reset_aesthetics()
atom.plot_distribution(columns=[1, 2], title="Distribution of temperatures")

In [10]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# And update the title with some custom fonts
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    )
)
# And update the title with some custom fonts
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    )
)

In [11]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# We can update the legend in a similar fashion
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    ),
    legend=dict(title="Legend's title"),
)
# We can update the legend in a similar fashion
atom.plot_distribution(
    columns=[1, 2],
    title=dict(
        text="Distribution of temperatures",
        font_color="teal",
        x=0,
        xanchor="left",
    ),
    legend=dict(title="Legend's title"),
)

Using a canvas¶

In [12]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
    atom.plot_distribution(columns=1)
    atom.plot_distribution(columns=2)
    atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
    atom.plot_qq(columns=[1, 2])
# Note how the same column over different plots is grouped
with atom.canvas(2, 2):
    atom.plot_distribution(columns=1)
    atom.plot_distribution(columns=2)
    atom.plot_qq(columns=[1, 2], distributions=["norm", "invgauss"])
    atom.plot_qq(columns=[1, 2])