Vectorizer
Transform the corpus into meaningful vectors of numbers. The
transformation is applied on the column named corpus
. If there
is no column with that name, an exception is raised. The transformed
columns are named after the word they are embedding with the prefix
corpus_
. This class can be accessed from atom through the
vectorize method. Read more in
the user guide.
Parameters: |
strategy: str, optional (default="bow") Strategy with which to vectorize the text. Choose from:
return_sparse: bool, optional (default=True) Verbosity level of the class. Choose from:
**kwargs |
Warning
Using return_sparse=True
can turn the transformation very slow and
occupy large chunks of memory when the corpus contains many tokens.
Attributes
Attributes: |
<strategy>: sklearn estimator
feature_names_in_: np.array
n_features_in_: int |
Methods
fit | Fit to data. |
fit_transform | Fit to text, then vectorize it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the text. |
Fit to text.
Parameters: |
X: dataframe-like
y: int, str, sequence or None, optional (default=None) |
Returns: |
Vectorizer Fitted instance of self. |
Fit to text, then vectorize it.
Parameters: |
X: dataframe-like Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
dict Parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
Vectorizer Estimator instance. |
Normalize the text.
Parameters: |
X: dataframe-like Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.vectorize(strategy="tfidf")
from atom.nlp import Vectorizer
vectorizer = Vectorizer("tfidf")
X = vectorizer.transform(X)