logo

Building Arabic Speech Recognition System Using HuBERT Model and Studying the Sources of Errors [Arabic]

2025-01-23 | Volume 3 Issue 1 - Volume 3 | Research Articles | Rima Sbih | Assef Jafar | Ali Kazem

Abstract

This paper presents the development of a speech recognition system for the Arabic language that can handle continuous speech and a large number of words, independent of the speaker, using deep neural network models trained by self-supervised learning. The system was built using the HuBERT model, and resulted in a word error rate (WER) of 19.3%. Our study on different data sets revealed that the HuBERT-based system has a significant ability to generalize to different spoken dialects. Additionally, we conducted a statistical analysis on the errors specific to the Arabic language that arise from the HuBERT-based system, which highlighted the necessity of incorporating an error correction language model to enhance system accuracy. After the addition of an Arabic language model, the WER decreased to 10.7%. Overall, this study emphasizes the potential of self-supervised learning-based speech recognition systems for the Arabic language and highlights the importance of incorporating language models to enhance system accuracy.


Keywords : Speech Recognition, Deep Learning, Self-attention, Supervised Learning, Self-Supervised Learning.

(ISSN - Online)

2959-8591

Article Information :

  1. Submitted :25/09/2024
  2. Accepted :24/11/2024

Correspondence

  1. rima.sbih@hiast.edu.sy

Cited As

  1. 1. Sbih R, Jafar A, Kazem A. Building Arabic Speech Recognition System Using HuBERT Model and Studying the Sources of Errors. Syrian Journal for Science and Innovation. 2025Jan23;3(1).

Current Issue