nlpaug.augmenter.word.synonym

Augmenter that apply semantic meaning based to textual input.

class nlpaug.augmenter.word.synonym.SynonymAug(aug_src='wordnet', model_path=None, name='Synonym_Aug', aug_min=1, aug_max=10, aug_p=0.3, lang='eng', stopwords=None, tokenizer=None, reverse_tokenizer=None, stopwords_regex=None, force_reload=False, verbose=0)[source]

Bases: nlpaug.augmenter.word.word_augmenter.WordAugmenter

Augmenter that leverage semantic meaning to substitute word.

Parameters:
  • aug_src (str) – Support ‘wordnet’ and ‘ppdb’ .
  • model_path (str) – Path of dictionary. Mandatory field if using PPDB as data source
  • lang (str) – Language of your text. Default value is ‘eng’.
  • aug_p (float) – Percentage of word will be augmented.
  • aug_min (int) – Minimum number of word will be augmented.
  • aug_max (int) – Maximum number of word will be augmented. If None is passed, number of augmentation is calculated via aup_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aug_p. Otherwise, using aug_max.
  • stopwords (list) – List of words which will be skipped from augment operation.
  • stopwords_regex (str) – Regular expression for matching words which will be skipped from augment operation.
  • tokenizer (func) – Customize tokenization process
  • reverse_tokenizer (func) – Customize reverse of tokenization process
  • force_reload (bool) – Force reload model to memory when initialize the class. Default value is False and suggesting to keep it as False if performance is the consideration.
  • name (str) – Name of this augmenter
>>> import nlpaug.augmenter.word as naw
>>> aug = naw.SynonymAug()
augment(data, n=1, num_thread=1)
Parameters:
  • data (object/list) – Data for augmentation. It can be list of data (e.g. list of string or numpy) or single element (e.g. string or numpy)
  • n (int) – Default is 1. Number of unique augmented output. Will be force to 1 if input is list of data
  • num_thread (int) – Number of thread for data augmentation. Use this option when you are using CPU and n is larger than 1
Returns:

Augmented data

>>> augmented_data = aug.augment(data)