Vectorizer
Transform the corpus into meaningful vectors of numbers. The
transformation is applied on the column named Corpus
. If there
is no column with that name, an exception is raised. This class
can be accessed from atom through the vectorize
method. Read more in the user guide.
Parameters: |
strategy: str, optional (default="BOW") Strategy with which to vectorize the text. Choose from:
Verbosity level of the class. Possible values are:
**kwargs |
Methods
fit | Fit to data. |
fit_transform | Fit to text, then vectorize it. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the text. |
Fit to text.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
Returns: |
self: Vectorizer Fitted instance of self. |
Fit to text, then vectorize it.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: Vectorizer Estimator instance. |
Normalize the text.
Parameters: |
X: dict, list, tuple, np.array, sps.matrix or pd.DataFrame Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.vectorize()
from atom.nlp import Vectorizer
vectorizer = Vectorizer("tf-idf")
X = vectorizer.transform(X)