Augmenter that apply target word replacement operation to textual input.
ReservedAug(reserved_tokens, action='substitute', case_sensitive=True, name='Reserved_Aug', aug_min=1, aug_max=10, aug_p=0.3, tokenizer=None, reverse_tokenizer=None, verbose=0, generate_all_combinations=False)¶
Augmenter that apply target word replacement for augmentation. Can also be used to generate all possible combinations. :param float aug_p: Percentage of word will be augmented. :param int aug_min: Minimum number of word will be augmented. :param int aug_max: Maximum number of word will be augmented. If None is passed, number of augmentation iscalculated via aup_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aug_p. Otherwise, using aug_max.
- reserved_tokens (list) – A list of swappable tokens (a list of list). For example, “FWD”, “Fwd” and “FW” are referring to “foward” in email communcation while “Sincerely” and “Best Regards” treated as same meaning. The input should be [[“FWD”, “Fwd”, “FW”], [“Sincerely”, “Best Regards”]].
- case_sensitive (bool) – Default is True. If True, it will only replace alternative token if all cases are same.
- generate_all_combinations (bool) – Default is False. If True, all the possible combinations of sentences possible with reserved_tokens will be returned.
- tokenizer (func) – Customize tokenization process
- reverse_tokenizer (func) – Customize reverse of tokenization process
- name (str) – Name of this augmenter
>>> import nlpaug.augmenter.word as naw >>> aug = naw.ReservedAug()
augment(data, n=1, num_thread=1)¶
- data (object/list) – Data for augmentation. It can be list of data (e.g. list of string or numpy) or single element (e.g. string or numpy)
- n (int) – Default is 1. Number of unique augmented output. Will be force to 1 if input is list of data
- num_thread (int) – Number of thread for data augmentation. Use this option when you are using CPU and n is larger than 1
>>> augmented_data = aug.augment(data)