Plots
ATOM provides many plotting methods to analyze the data or compare the model performances. Descriptions and examples can be found in the API section. ATOM uses the packages matplotlib, seaborn, shap and wordcloud for plotting.
Plots that compare model performances (methods with the models
parameter) can be called directly from a trainer, e.g. atom.plot_roc()
,
or from one of the models, e.g. atom.LGB.plot_roc()
. If called from
a trainer, it makes the plot for all models in its pipeline. If called
from a specific model, it makes the plot only for that model.
Plots that analyze the dataset (methods without the models
parameter)
can only be called from atom. The rest of the trainers are supposed
to be used only when the goal is just modelling, not data manipulation.
Parameters
Apart from the plot-specific parameters, all plots have four parameters in common:
- The
title
parameter allows you to add a title to the plot. - The
figsize
parameter adjust the plot's size. - The
filename
parameter is used to save the plot. - The
display
parameter determines whether to show or return the plot.
Aesthetics
The plot aesthetics can be customized using the plot attributes, e.g.
atom.style = "white"
. These attributes can be called from any instance
with plotting methods. Note that the plot attributes are attached to the
class and not the instance. This means that changing the attribute will
also change it for all other instances in the module. Use the
reset_aesthetics method
to reset all the aesthetics to their default value. The default values are:
- style: "darkgrid"
- palette: "GnBu_r_d"
- title_fontsize: 20
- label_fontsize: 16
- tick_fontsize: 12
Canvas
Sometimes it is desirable to draw multiple plots side by side in order
to be able to compare them easier. Use the canvas
method for this. The canvas method is a @contextmanager
, i.e. it's
used through the with
command. Plots in a canvas will ignore the
figsize, filename and display parameters. Instead, call these parameters
from the canvas for the final figure. If a variable is assigned to the
canvas (e.g. with atom.canvas() as fig
), it contains the resulting
matplotlib figure.
For example, we can use a canvas to compare the results of a XGBoost and LightGBM model on the train and test set. We could also draw the lines for both models in the same axes, but then the plot would become too cluttered.
atom = ATOMClassifier(X, y)
atom.run(["xgb", "lgb"], n_calls=0)
with atom.canvas(2, 2, title="XGBoost vs LightGBM", filename="canvas"):
atom.xgb.plot_roc(dataset="both", title="ROC - XGBoost")
atom.lgb.plot_roc(dataset="both", title="ROC - LightGBM")
atom.xgb.plot_prc(dataset="both", title="PRC - XGBoost")
atom.lgb.plot_prc(dataset="both", title="PRC - LightGBM")
SHAP
The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP's plotting functions directly from its API. The seven plots are: bar_plot, beeswarm_plot, decision_plot, force_plot, heatmap_plot, scatter_plot and waterfall_plot.
Since the plots are not made by ATOM, we can't draw multiple models in
the same figure. Selecting more than one model will raise an exception.
To avoid this, call the plot directly from a model, e.g. atom.xgb.force_plot()
.
Info
You can recognize the SHAP plots by the fact that they end (instead
of start) with the word plot
.
Available plots
A list of available plots can be found hereunder. Note that not all plots can be called from every class and that their availability can depend on the task at hand.
plot_correlation | Plot the data's correlation matrix. |
plot_scatter_matrix | Plot the data's scatter matrix. |
plot_qq | Plot a quantile-quantile plot. |
plot_distribution | Plot column distributions. |
plot_wordcloud | Plot a wordcloud from the corpus. |
plot_ngrams | Plot n-gram frequencies. |
plot_pipeline | Plot a diagram of every estimator in atom's pipeline. |
plot_pca | Plot the explained variance ratio vs the number of components. |
plot_components | Plot the explained variance ratio per components. |
plot_rfecv | Plot the RFECV results. |
plot_successive_halving | Plot of the models" scores per iteration of the successive halving. |
plot_learning_curve | Plot the model's learning curve. |
plot_results | Plot a boxplot of the bootstrap results. |
plot_bo | Plot the bayesian optimization scores. |
plot_evals | Plot evaluation curves for the train and test set. |
plot_roc | Plot the Receiver Operating Characteristics curve. |
plot_prc | Plot the precision-recall curve. |
plot_det | Plot the detection error tradeoff curve. |
plot_gains | Plot the cumulative gains curve. |
plot_lift | Plot the lift curve. |
plot_errors | Plot a model's prediction errors. |
plot_residuals | Plot a model's residuals. |
plot_feature_importance | Plot a tree-based model's feature importance. |
plot_permutation_importance | Plot the feature permutation importance of models. |
plot_partial_dependence | Plot the partial dependence of features. |
plot_confusion_matrix | Plot a model's confusion matrix. |
plot_threshold | Plot metric performances against threshold values. |
plot_probabilities | Plot the probability distribution of the classes in the target column. |
plot_calibration | Plot the calibration curve for a binary classifier. |
bar_plot | Plot SHAP's bar plot. |
beeswarm_plot | Plot SHAP's beeswarm plot. |
decision_plot | Plot SHAP's decision plot. |
force_plot | Plot SHAP's force plot. |
heatmap_plot | Plot SHAP's heatmap plot. |
scatter_plot | Plot SHAP's scatter plot. |
waterfall_plot | Plot SHAP's waterfall plot. |