plot_ngrams
method plot_ngrams(ngram="bigram", rows="dataset", show=10, title=None, legend="lower right", figsize=None, filename=None, display=True)[source]
Plot n-gram frequencies.
The text for the plot is extracted from the column named
corpus
. If there is no column with that name, an exception
is raised. If the documents are not tokenized, the words are
separated by spaces.
Tip
Use atom's tokenize method to separate the words creating n-grams based on their frequency in the corpus.
Parameters |
ngram: str or int, default="bigram"
Number of contiguous words to search for (size of n-gram).
Choose from: word (1), bigram (2), trigram (3), quadgram (4).
rows: hashable, segment, sequence or dataframe, default="dataset"
Selection of rows in the corpus
to include in the search.
show: int or None, default=10
Number of n-grams (ordered by number of occurrences) to
show in the plot. If none, show all n-grams (up to 200).
title: str, dict or None, default=None
Title for the plot.
legend: str, dict or None, default="lower right"
Legend for the plot. See the user guide for
an extended description of the choices.
figsize: tuple or None, default=None
Figure's size in pixels, format as (x, y). If None, it
adapts the size to the number of n-grams shown.
filename: str, Path or None, default=None
Save the plot using this name. Use "auto" for automatic
naming. The type of the file depends on the provided name
(.html, .png, .pdf, etc...). If
display: bool or None, default=Truefilename has no file type,
the plot is saved as html. If None, the plot is not saved.
Whether to render the plot. If None, it returns the figure.
|
Returns | {#plot_ngrams-go.Figure or None}
go.Figure or None
Plot object. Only returned if display=None .
|
Example
>>> import numpy as np
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import fetch_20newsgroups
>>> X, y = fetch_20newsgroups(
... return_X_y=True,
... categories=["alt.atheism", "sci.med", "comp.windows.x"],
... shuffle=True,
... random_state=1,
... )
>>> X = np.array(X).reshape(-1, 1)
>>> atom = ATOMClassifier(X, y, random_state=1)
>>> atom.textclean()
>>> atom.textnormalize()
>>> atom.plot_ngrams()