... application to Vietnamese
Séminaire
de Melle TRAN Thi Anh Xuan, doctorante en co-tutelle Institut MICA/Gipsa-lab - Date : jeudi 30 août 2012, 15h00 - Lieu : "seminar room", B1, 9ème étage, Institut MICA, Hanoi University of Science and Technology

Intervenant :
Melle TRAN Thi Anh Xuan, doctorante en co-tutelle entre l'Institut MICA et le laboratoire GIPS-lab de Grenoble France

Date : jeudi 30 août 2012, 15h00
Lieu : salle "seminar room" 9ème étage, Bâtiment B1, Institut MICA, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi
Interprète traducteur : le séminaire sera présenté en anglais

Résumé/abstract:
Automatic speech recognition (ASR) is typically split into two main processes: computing characteristics (usually spectral features) of the speech signal, and using a model of the phonetic and linguistic properties of the language. The characteristics of the signal are typically extracted by performing a spectral analysis every 10 to 20 ms, thus producing a series of characteristic vectors which are then run through a classification algorithm such as Hidden Markov Models or Neural Networks.
Most of the current limitations of ASR systems can actually be traced back to the very poor definition of the acoustic characteristics used for the classification. Indeed, for some 40 years, speech is considered as a sequence of quasi-stable signals (vowels) separated by transitions (consonants). It is also commonly accepted that vowels are acoustic “targets” that must be reached to make speech understandable.
ASR systems based on this imperfect modeling tend to be work only for one gender (we need to create specialized models for men and women), and fail completely at taking children voices into account.
We propose with this thesis to try and develop a dynamic description of acoustic models in “acoustic gestures” to improve recognition accuracy.