plot_ngrams

method plot_ngrams(ngram="bigram", rows="dataset", show=10, title=None, legend="lower right", figsize=None, filename=None, display=True)[source]

Plot n-gram frequencies.

The text for the plot is extracted from the column named corpus. If there is no column with that name, an exception is raised. If the documents are not tokenized, the words are separated by spaces.

Tip

Use atom's tokenize method to separate the words creating n-grams based on their frequency in the corpus.

Parameters

ngram: str or int, default="bigram"

Number of contiguous words to search for (size of n-gram). Choose from: word (1), bigram (2), trigram (3), quadgram (4).

rows: hashable, segment, sequence or dataframe, default="dataset"

Selection of rows in the corpus to include in the search.

show: int or None, default=10

Number of n-grams (ordered by number of occurrences) to show in the plot. If none, show all n-grams (up to 200).

title: str, dict or None, default=None

Title for the plot.

If None, no title is shown.
If str, text for the title.
If dict, title configuration.

legend: str, dict or None, default="lower right"

Legend for the plot. See the user guide for an extended description of the choices.

If None: No legend is shown.
If str: Position to display the legend.
If dict: Legend configuration.

figsize: tuple or None, default=None

Figure's size in pixels, format as (x, y). If None, it adapts the size to the number of n-grams shown.

filename: str, Path or None, default=None

Save the plot using this name. Use "auto" for automatic naming. The type of the file depends on the provided name (.html, .png, .pdf, etc...). If filename has no file type, the plot is saved as html. If None, the plot is not saved.

display: bool or None, default=True

Whether to render the plot. If None, it returns the figure.

Returns

go.Figure or None

Plot object. Only returned if display=None.

Example

>>> import numpy as np
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import fetch_20newsgroups

>>> X, y = fetch_20newsgroups(
...     return_X_y=True,
...     categories=["alt.atheism", "sci.med", "comp.windows.x"],
...     shuffle=True,
...     random_state=1,
... )
>>> X = np.array(X).reshape(-1, 1)

>>> atom = ATOMClassifier(X, y, random_state=1)
>>> atom.textclean()
>>> atom.textnormalize()
>>> atom.plot_ngrams()