Plots

ATOM provides many plotting methods to analyze the data or compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib, seaborn, shap and wordcloud for plotting.

Plots that compare model performances (methods with the models parameter) can be called directly from a trainer, e.g. atom.plot_roc(), or from one of the models, e.g. atom.LGB.plot_roc(). If called from a trainer, it makes the plot for all models in its pipeline. If called from a specific model, it makes the plot only for that model.

Plots that analyze the dataset (methods without the models parameter) can only be called from atom. The rest of the trainers are supposed to be used only when the goal is just modelling, not data manipulation.

Parameters

Apart from the plot-specific parameters, all plots have four parameters in common:

The title parameter allows you to add a title to the plot.
The figsize parameter adjust the plot's size.
The filename parameter is used to save the plot.
The display parameter determines whether to show or return the plot.

Aesthetics

The plot aesthetics can be customized using the plot attributes, e.g. atom.style = "white". These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. Use the reset_aesthetics method to reset all the aesthetics to their default value. The default values are:

style: "darkgrid"
palette: "GnBu_r_d"
title_fontsize: 20
label_fontsize: 16
tick_fontsize: 12

Canvas

Sometimes it is desirable to draw multiple plots side by side in order to be able to compare them easier. Use the canvas method for this. The canvas method is a @contextmanager, i.e. it's used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. If a variable is assigned to the canvas (e.g. with atom.canvas() as fig), it contains the resulting matplotlib figure.

For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot would become too cluttered.

atom = ATOMClassifier(X, y)
atom.run(["xgb", "lgb"], n_calls=0)

with atom.canvas(2, 2, title="XGBoost vs LightGBM", filename="canvas"):
    atom.xgb.plot_roc(dataset="both", title="ROC - XGBoost")
    atom.lgb.plot_roc(dataset="both", title="ROC - LightGBM")
    atom.xgb.plot_prc(dataset="both", title="PRC - XGBoost")
    atom.lgb.plot_prc(dataset="both", title="PRC - LightGBM")

SHAP

The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot, beeswarm_plot, decision_plot, force_plot, heatmap_plot, scatter_plot and waterfall_plot.

Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot().

Info

You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot.

Available plots

A list of available plots can be found hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand.

plot_correlation	Plot the data's correlation matrix.
plot_scatter_matrix	Plot the data's scatter matrix.
plot_qq	Plot a quantile-quantile plot.
plot_distribution	Plot column distributions.
plot_wordcloud	Plot a wordcloud from the corpus.
plot_ngrams	Plot n-gram frequencies.
plot_pipeline	Plot a diagram of every estimator in atom's pipeline.
plot_pca	Plot the explained variance ratio vs the number of components.
plot_components	Plot the explained variance ratio per components.
plot_rfecv	Plot the RFECV results.
plot_successive_halving	Plot of the models" scores per iteration of the successive halving.
plot_learning_curve	Plot the model's learning curve.
plot_results	Plot a boxplot of the bootstrap results.
plot_bo	Plot the bayesian optimization scores.
plot_evals	Plot evaluation curves for the train and test set.
plot_roc	Plot the Receiver Operating Characteristics curve.
plot_prc	Plot the precision-recall curve.
plot_det	Plot the detection error tradeoff curve.
plot_gains	Plot the cumulative gains curve.
plot_lift	Plot the lift curve.
plot_errors	Plot a model's prediction errors.
plot_residuals	Plot a model's residuals.
plot_feature_importance	Plot a tree-based model's feature importance.
plot_permutation_importance	Plot the feature permutation importance of models.
plot_partial_dependence	Plot the partial dependence of features.
plot_confusion_matrix	Plot a model's confusion matrix.
plot_threshold	Plot metric performances against threshold values.
plot_probabilities	Plot the probability distribution of the classes in the target column.
plot_calibration	Plot the calibration curve for a binary classifier.
bar_plot	Plot SHAP's bar plot.
beeswarm_plot	Plot SHAP's beeswarm plot.
decision_plot	Plot SHAP's decision plot.
force_plot	Plot SHAP's force plot.
heatmap_plot	Plot SHAP's heatmap plot.
scatter_plot	Plot SHAP's scatter plot.
waterfall_plot	Plot SHAP's waterfall plot.