nlpaug.augmenter.spectrogram.frequency_masking¶

class nlpaug.augmenter.spectrogram.frequency_masking.FrequencyMaskingAug(name='FrequencyMasking_Aug', zone=(0.2, 0.8), coverage=1.0, factor=(40, 80), verbose=0, silence=False, stateless=True)[source]¶

Bases: nlpaug.augmenter.spectrogram.spectrogram_augmenter.SpectrogramAugmenter

Augmenter that mask spectrogram based on frequency by random values.

Parameters:

zone (tuple) – Default value is (0.2, 0.8). Assign a zone for augmentation. By default, no any augmentation will be applied in first 20% and last 20% of whole audio.
coverage (float) – Default value is 1 and value should be between 0 and 1. Portion of augmentation. If 1 is assigned, augment operation will be applied to target audio segment. For example, the audio duration is 60 seconds while zone and coverage are (0.2, 0.8) and 0.7 respectively. 42 seconds ((0.8-0.2)*0.7*60) audio will be augmented.
factor (tuple) – Default value is (40, 80) and value should not exceed number of mel frequency channels. Factor value will be picked within the range of this tuple value. Mask range will be between [0, v - factor) while v is the number of mel frequency channels.
name (str) – Name of this augmenter

>>> import nlpaug.augmenter.spectogram as nas
>>> aug = nas.FrequencyMaskingAug()

augment(data, n=1, num_thread=1)¶

Parameters:

data (object/list) – Data for augmentation. It can be list of data (e.g. list of string or numpy) or single element (e.g. string or numpy). Numpy format only supports audio or spectrogram data. For text data, only support string or list of string.
n (int) – Default is 1. Number of unique augmented output. Will be force to 1 if input is list of data
num_thread (int) – Number of thread for data augmentation. Use this option when you are using CPU and n is larger than 1

Returns:

Augmented data

>>> augmented_data = aug.augment(data)

substitute(data)[source]¶: https://arxiv.org/pdf/1904.08779.pdf, https://arxiv.org/pdf/2001.01401.pdf Frequency masking is applied so that f consecutive mel frequency channels [f0, f0 + f) are masked, where f is first chosen from a uniform distribution from 0 to the frequency mask parameter F, and f0 is chosen from [0, v - f). v is the number of mel frequency channels.