nlpaug.augmenter.sentence.random

Augmenter that apply operation (sentence level) to textual input based on abstractive summarization.

class nlpaug.augmenter.sentence.random.RandomSentAug(mode='neighbor', action='swap', name='RandomSent_Aug', aug_min=1, aug_max=10, aug_p=0.3, tokenizer=None, verbose=0)[source]

Bases: nlpaug.augmenter.sentence.sentence_augmenter.SentenceAugmenter

Augmenter that apply randomly behavior for augmentation.

Parameters:
  • mode (str) – Shuffle sentence to left, right, neighbor or random position. For left, target sentence will be swapped with left sentnece. For right, target sentence will be swapped with right sentnece. For neighbor, target sentence will be swapped with left or right sentnece radomly. For random, target sentence will be swapped with any sentnece randomly.
  • aug_p (float) – Percentage of sentence will be augmented.
  • aug_min (int) – Minimum number of sentence will be augmented.
  • aug_max (int) – Maximum number of sentence will be augmented. If None is passed, number of augmentation is calculated via aup_p. If calculated result from aug_p is smaller than aug_max, will use calculated result from aug_p. Otherwise, using aug_max.
  • tokenizer (func) – Customize tokenization process
  • name (str) – Name of this augmenter
>>> import nlpaug.augmenter.sentence as nas
>>> aug = nas.RandomSentAug()
augment(data, n=1, num_thread=1)
Parameters:
  • data (object/list) – Data for augmentation. It can be list of data (e.g. list of string or numpy) or single element (e.g. string or numpy). Numpy format only supports audio or spectrogram data. For text data, only support string or list of string.
  • n (int) – Default is 1. Number of unique augmented output. Will be force to 1 if input is list of data
  • num_thread (int) – Number of thread for data augmentation. Use this option when you are using CPU and n is larger than 1
Returns:

Augmented data

>>> augmented_data = aug.augment(data)