Skip to content

Plots


ATOM provides many plotting methods to analyze the data or compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib, seaborn, shap and wordcloud for plotting.

Plots that compare model performances (methods with the models parameter) can be called directly from a trainer, e.g. atom.plot_roc(), or from one of the models, e.g. atom.LGB.plot_roc(). If called from a trainer, it makes the plot for all models in its pipeline. If called from a specific model, it makes the plot only for that model.

Plots that analyze the dataset (methods without the models parameter) can only be called from atom. The rest of the trainers are supposed to be used only when the goal is just modelling, not data manipulation.


Parameters

Apart from the plot-specific parameters, all plots have four parameters in common:

  • The title parameter allows you to add a title to the plot.
  • The figsize parameter adjust the plot's size.
  • The filename parameter is used to save the plot.
  • The display parameter determines whether to show or return the plot.


Aesthetics

The plot aesthetics can be customized using the plot attributes, e.g. atom.style = "white". These attributes can be called from any instance with plotting methods. Note that the plot attributes are attached to the class and not the instance. This means that changing the attribute will also change it for all other instances in the module. Use the reset_aesthetics method to reset all the aesthetics to their default value. The default values are:

  • style: "darkgrid"
  • palette: "GnBu_r_d"
  • title_fontsize: 20
  • label_fontsize: 16
  • tick_fontsize: 12


Canvas

Sometimes it is desirable to draw multiple plots side by side in order to be able to compare them easier. Use the canvas method for this. The canvas method is a @contextmanager, i.e. it's used through the with command. Plots in a canvas will ignore the figsize, filename and display parameters. Instead, call these parameters from the canvas for the final figure. If a variable is assigned to the canvas (e.g. with atom.canvas() as fig), it contains the resulting matplotlib figure.

For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot would become too cluttered.

atom = ATOMClassifier(X, y)
atom.run(["xgb", "lgb"], n_calls=0)

with atom.canvas(2, 2, title="XGBoost vs LightGBM", filename="canvas"):
    atom.xgb.plot_roc(dataset="both", title="ROC - XGBoost")
    atom.lgb.plot_roc(dataset="both", title="ROC - LightGBM")
    atom.xgb.plot_prc(dataset="both", title="PRC - XGBoost")
    atom.lgb.plot_prc(dataset="both", title="PRC - LightGBM")
canvas


SHAP

The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot, beeswarm_plot, decision_plot, force_plot, heatmap_plot, scatter_plot and waterfall_plot.

Calculating the Shapley values is computationally expensive, especially for model agnostic explainers like Permutation. To avoid having to recalculate the values for every plot, ATOM stores the shapley values internally after the first calculation, and access them when needed again.

Since the plots are not made by ATOM, we can't draw multiple models in the same figure. Selecting more than one model will raise an exception. To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot().

Info

You can recognize the SHAP plots by the fact that they end (instead of start) with the word plot.


Available plots

A list of available plots can be found hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand.

plot_correlation Plot the data's correlation matrix.
plot_scatter_matrix Plot the data's scatter matrix.
plot_distribution Plot column distributions.
plot_qq Plot a quantile-quantile plot.
plot_wordcloud Plot a wordcloud from the corpus.
plot_ngrams Plot n-gram frequencies.
plot_pipeline Plot a diagram of every estimator in atom's pipeline.
plot_pca Plot the explained variance ratio vs the number of components.
plot_components Plot the explained variance ratio per components.
plot_rfecv Plot the rfecv results.
plot_successive_halving Plot of the models" scores per iteration of the successive halving.
plot_learning_curve Plot the model's learning curve.
plot_results Plot a boxplot of the bootstrap results.
plot_bo Plot the bayesian optimization scores.
plot_evals Plot evaluation curves for the train and test set.
plot_roc Plot the Receiver Operating Characteristics curve.
plot_prc Plot the precision-recall curve.
plot_det Plot the detection error tradeoff curve.
plot_gains Plot the cumulative gains curve.
plot_lift Plot the lift curve.
plot_errors Plot a model's prediction errors.
plot_residuals Plot a model's residuals.
plot_feature_importance Plot a tree-based model's feature importance.
plot_permutation_importance Plot the feature permutation importance of models.
plot_partial_dependence Plot the partial dependence of features.
plot_parshap Plot the partial correlation of shap values.
plot_confusion_matrix Plot a model's confusion matrix.
plot_threshold Plot metric performances against threshold values.
plot_probabilities Plot the probability distribution of the classes in the target column.
plot_calibration Plot the calibration curve for a binary classifier.
bar_plot Plot SHAP's bar plot.
beeswarm_plot Plot SHAP's beeswarm plot.
decision_plot Plot SHAP's decision plot.
force_plot Plot SHAP's force plot.
heatmap_plot Plot SHAP's heatmap plot.
scatter_plot Plot SHAP's scatter plot.
waterfall_plot Plot SHAP's waterfall plot.
Back to top