Normalizer
Convert words to a more uniform standard. The transformations are
applied on the column named Corpus
, in the same order the parameters
are presented. If there is no column with that name, an exception is
raised. If the provided documents are strings, words are separated by
spaces. This class can be accessed from atom through the normalize
method. Read more in the user guide.
Parameters: |
stopwords: bool or str, optional (default=True) Whether to remove a predefined dictionary of stopwords.
custom_stopwords: sequence or None, optional (default=None) Whether to apply stemming using SnowballStemmer.
lemmatize: bool, optional (default=True) Verbosity level of the class. Possible values are:
|
Tip
Use the tokenize method to convert the documents from a string to a sequence of words.
Methods
fit_transform | Same as transform. |
get_params | Get parameters for this estimator. |
log | Write information to the logger and print to stdout. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Transform the text. |
Normalize the text.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Get parameters for this estimator.
Parameters: |
deep: bool, optional (default=True) |
Returns: |
params: dict Dictionary of the parameter names mapped to their values. |
Write a message to the logger and print it to stdout.
Parameters: |
msg: str
level: int, optional (default=0) |
Save the instance to a pickle file.
Parameters: |
filename: str, optional (default="auto") Name of the file. Use "auto" for automatic naming. |
Set the parameters of this estimator.
Parameters: |
**params: dict Estimator parameters. |
Returns: |
self: Normalizer Estimator instance. |
Normalize the text.
Parameters: |
X: dict, list, tuple, np.ndarray or pd.DataFrame Does nothing. Implemented for continuity of the API. |
Returns: |
X: pd.DataFrame |
Example
from atom import ATOMClassifier
atom = ATOMClassifier(X, y)
atom.normalize()
from atom.nlp import Normalizer
normalizer = Normalizer()
X = normalizer.transform(X)