nlpaug.augmenter.word.reserved

Augmenter that apply target word replacement operation to textual input.

class nlpaug.augmenter.word.reserved.ReservedAug(reserved_tokens, action='substitute', case_sensitive=True, name='Reserved_Aug', aug_min=1, aug_max=10, aug_p=0.3, tokenizer=None, reverse_tokenizer=None, verbose=0, generate_all_combinations=False)[source]

Bases: nlpaug.augmenter.word.word_augmenter.WordAugmenter

Augmenter that apply target word replacement for augmentation. Can also be used to generate all possible combinations. :param float aug_p: Percentage of word will be augmented. :param int aug_min: Minimum number of word will be augmented. :param int aug_max: Maximum number of word will be augmented. If None is passed, number of augmentation is

calculated via aup_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aug_p. Otherwise, using aug_max.
Parameters:
  • reserved_tokens (list) – A list of swappable tokens (a list of list). For example, “FWD”, “Fwd” and “FW” are referring to “foward” in email communcation while “Sincerely” and “Best Regards” treated as same meaning. The input should be [[“FWD”, “Fwd”, “FW”], [“Sincerely”, “Best Regards”]].
  • case_sensitive (bool) – Default is True. If True, it will only replace alternative token if all cases are same.
  • generate_all_combinations (bool) – Default is False. If True, all the possible combinations of sentences possible with reserved_tokens will be returned.
  • tokenizer (func) – Customize tokenization process
  • reverse_tokenizer (func) – Customize reverse of tokenization process
  • name (str) – Name of this augmenter
>>> import nlpaug.augmenter.word as naw
>>> aug = naw.ReservedAug()
augment(data, n=1, num_thread=1)
Parameters:
  • data (object/list) – Data for augmentation. It can be list of data (e.g. list of string or numpy) or single element (e.g. string or numpy). Numpy format only supports audio or spectrogram data. For text data, only support string or list of string.
  • n (int) – Default is 1. Number of unique augmented output. Will be force to 1 if input is list of data
  • num_thread (int) – Number of thread for data augmentation. Use this option when you are using CPU and n is larger than 1
Returns:

Augmented data

>>> augmented_data = aug.augment(data)