Vectorizer
Transform the corpus into meaningful vectors of numbers. The
transformation is applied on the column named Corpus. If there
is no column with that name, an exception is raised. This class
can be accessed from atom through the vectorize
method. Read more in the user guide.
| Parameters: |
strategy: str, optional (default="BOW") Strategy with which to vectorize the text. Choose from:
Verbosity level of the class. Possible values are:
**kwargs |
Methods
| fit | Fit to data. |
| fit_transform | Fit to text, then vectorize it. |
| get_params | Get parameters for this estimator. |
| log | Write information to the logger and print to stdout. |
| save | Save the instance to a pickle file. |
| set_params | Set the parameters of this estimator. |
| transform | Transform the text. |
Fit to text.
| Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame
y: int, str, sequence or None, optional (default=None) |
| Returns: |
self: Vectorizer Fitted instance of self. |
Fit to text, then vectorize it.
| Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame Does nothing. Implemented for continuity of the API. |
| Returns: |
X: pd.DataFrame |
Get parameters for this estimator.
| Parameters: |
deep: bool, optional (default=True) |
| Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
| Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
| Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
| Parameters: |
**params: dict Estimator parameters. |
| Returns: |
self: Vectorizer Estimator instance. |
Normalize the text.
| Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame Does nothing. Implemented for continuity of the API. |
| Returns: |
X: pd.DataFrame |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.vectorize()
from atom.nlp import Vectorizer
vectorizer = Vectorizer("tf-idf")
X = vectorizer.transform(X)