Vectorizer
Transform the corpus into meaningful vectors of numbers. The
transformation is applied on the column named corpus
. If there
is no column with that name, an exception is raised. The transformed
columns are named after the word they are embedding (if the column is
already present in the provided dataset, _[strategy]
is added behind
the name). This class can be accessed from atom through the
vectorize method. Read more in
the user guide.
Parameters: |
strategy: str, optional (default="bow") Strategy with which to vectorize the text. Choose from:
return_sparse: bool, optional (default=True) Verbosity level of the class. Possible values are:
**kwargs |
Warning
Using return_sparse=True
can turn the transformation very slow and
occupy large chunks of memory when the corpus contains many tokens.
Attributes
Attributes: |
<strategy>: sklearn estimator Object used to prune the data, e.g. vectorizer.bow for the
Bag of Words strategy.
|
Methods
fit | Fit to data. |
fit_transform | Fit to text, then vectorize it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the text. |
Fit to text.
Parameters: |
X: dataframe-like
y: int, str, sequence or None, optional (default=None) |
Returns: |
Vectorizer Fitted instance of self. |
Fit to text, then vectorize it.
Parameters: |
X: dataframe-like Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
dict Parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
Vectorizer Estimator instance. |
Normalize the text.
Parameters: |
X: dataframe-like Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.vectorize(strategy="tfidf")
from atom.nlp import Vectorizer
vectorizer = Vectorizer("tfidf")
X = vectorizer.transform(X)