Wav2Vec-based Audio Data Augmentation for Low-Resource Speech Recognition

Authors

  • P. Haritha Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram,Tamil Nadu, India. Author
  • P. Shanmugavadivu Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed to be University), Gandhigram,Tamil Nadu, India. Author

DOI:

https://doi.org/10.47392/IRJAEH.2026.0014

Keywords:

Audio Data Augmentation, Low-Resource Speech Recognition, Wav2Vec, Automatic Speech Recognition, Audio Processing

Abstract

Audio Data Augmentation (ADA) is a transformative process of small datasets into voluminous datasets. ADA can be performed on any type of dataset namely Images (Mel Spectogram), Audio and Text, based on applications such as Gender Identification, Speech Recognition, Text Summarization. ADA plays a vital role in the development of Automatic Speech Recognition (ASR) systems when the experimental language datasets are smaller in size and are being low resource. This article focuses on performing ADA techniques namely the addition of noise, pitch shifting, increasing or decreasing of speed and adding reverberation to the audio signals. The proposed method includes preprocessing, data augmentation, audio transcription using pre-trained Self-Supervised Learning based Wav2vec models; and finally with the post-processing of data on the removal of induced tags from the transcribed data. The article integrates audio transcription after performing audio augmentation techniques to evaluate the quality of speech using Word Error Rate (WER). The proposed Audio Data Augmentation for Low Resource Speech Recognition (ADA-LRSR) with the integration of Wav2Vec (Vakyansh) achieved an overall WER of 0.5231, which was promising than that of other Wav2Vec variants (Base and Large). The suggested approach is evaluated on a manually recorded 39 preprocessed audio files and obtained 312 audio files after augmentation. In addition, ADA-LRSR’s framework chose the addition of noise and reverberation as the best augmentation techniques with preservation of speech quality.

Downloads

Download data is not yet available.

Downloads

Published

2026-01-13

How to Cite

Wav2Vec-based Audio Data Augmentation for Low-Resource Speech Recognition. (2026). International Research Journal on Advanced Engineering Hub (IRJAEH), 4(01), 111-117. https://doi.org/10.47392/IRJAEH.2026.0014

Similar Articles

1-10 of 742

You may also start an advanced similarity search for this article.