A speech recognition model based on tri-phones for the Arabic language

Al-Diri, Bashir, Sharieh, Ahmad and Qutiashat, Munib (2007) A speech recognition model based on tri-phones for the Arabic language. Advances in modelling Series B: Signal processing and pattern recognition, 50 (2). pp. 49-64. ISSN 1240-4543

Full content URL: http://www.aldiri.info/Speech/SpeechDefault.aspx

Documents
__network.uni_staff_S2_jpartridge_A Speech Recognition Model Based on Tri-Phones for the Arabic Language17032007a.pdf
[img]
[Download]
[img]
Preview
PDF
__network.uni_staff_S2_jpartridge_A Speech Recognition Model Based on Tri-Phones for the Arabic Language17032007a.pdf - Whole Document

582kB
Item Type:Article
Item Status:Live Archive

Abstract

One way to keep up a decent recognition of results- with increasing vocabulary- is the use of base units rather than words. This paper presents a Continuous Speech Large Vocabulary Recognition System-for Arabic, which is based on tri-phones. In order to train and test the system, a dictionary and a 39-dimensional Mel Frequency Cepstrum Coefficient (MFCC) feature vector was computed. The computations involve: Hamming Window, Fourier Transformation, Average Spectral Value (ASV), Logarithm of ASV, Normalized Energy, as well as, the first and second order time derivatives of 13-coefficients. A combination of a Hidden Markov Model and a Neural Network Approach was used in order to model the basic temporal nature of the speech signal. The results obtained by testing the recognizer system with 7841 tri-phones. 13-coefficients indicate accuracy level of 58%. 39-coeefficents indicates 62%. With Cepstrum Mean Normalization, there is an indication of 71%. With these small available data-only 620 sentences-these results are very encouraging.

Keywords:A Speech Recognition, Automatic Speech Recognition, Tri-phones, Mel Frequency Cepstrum Coefficient, Hidden Markov Model, Neural Network
Subjects:G Mathematical and Computer Sciences > G700 Artificial Intelligence
G Mathematical and Computer Sciences > G710 Speech and Natural Language Processing
Divisions:College of Science > School of Computer Science
ID Code:2080
Deposited On:02 Dec 2009 11:14

Repository Staff Only: item control page