photo

Thi-Ngoc-Diep DO Dr.

Speech Communication Department

Room 1004, B1 building
MICA International Research Institute (HUST - CNRS/UMI2954 - Grenoble INP)
Hanoi University of Science and Technology
No.1, Dai Co Viet rd., Hai Ba Trung dist.
Hanoi, Vietnam

Email: ngoc-diep.do at mica.edu.vn ; diep.dothingoc at hust.edu.vn
Tel: +84-4-38683087 ex. 115
Fax: +84-4-38683551

 

Short Academic Biography

I graduated my engineer of Information Technology from Hanoi University of Science and Technology (HUST) in 2005. In 2007, I have MS degree of Information Processing and Communication from HUST. I received my Ph.D. degree of “Computer Science” from Université Grenoble Alpes (UGA), France in 2011. I am currently a lecturer/researcher at Speech Communication Department, MICA International Research Institute, Hanoi University of Science and Technology.

My CV

Current Research Interests

  • Natural Language Processing
  • Speech Translation
  • Multilingual Machine Translation
  • Machine Translation for Minority Languages
  • Visual-Textual Association Technology

Involved Projects: On-going

Duration Title Funding Role
2017-2021 Research and development of an automatic translation system from Vietnamese text to Muong speech, applied to other unwritten minority languages in Vietnam (ĐTĐLCN.20/17) National project Participant

Involved Projects: Finished

Duration Title Funding Role
2015-2017 Shoot My Mind ASIAN-ITC (JFLI Japan, NII Japan, LIG France, ITC Cambodia, MICA Vietnam) Participant
2014-2016 Visually impaired people assistance using multimodal technologies (VLIR-UOS ZEIN2012RIP19) Flemish Interuniversity Council (VLIR-UOS) Participant
2012-2015 Intelligent human-machine interaction system using Vietnamese voice controlled in the noisy environment – National project(KC.03.07/11-15) Vietnamese National Project Participant
2013-2014 Advances in rapid prototyping for computerization of languages PEPS HuMaIn CNRS – France (Lacito France, MICA) Promoter in Vietnam, researcher
2013 Vietnamese Voice Assistant for Vision-impaired User – VIVAVU 1st phase Ministry of Information and Communication Promoter, researcher
2009- 2012 PI-languages: Languages and Speech processing technology for under-resourced languages ANR - France (LIG France, LIA France, MICA) Participant

Reviewer/Scientific committee

Reviewer/Scientific committee for several national/international journals, conferences and workshops (in English): IALP 2013, 2014, ICCE 2014, KhmerLP 2015, TALLIP 2015, ICTA 2016, JST 2016, REV-ECIT 2018, MAPR 2019

International Conference Papers

  1. Van-Thinh Nguyen, Thi-Ngoc-Diep Do, Dang-Khoa Mac, Eric Castelli.“Optimizing data transmission on mobile platform for speech translation systemThe First Regional Conference on OCR and NLP for ASEAN Languages (ONA 2017), 11-2017, Phnom Penh, Cambodia.
  2. Pham T.T.T., Nguyen DD., Ta B.H.P., Nguyen TB., Do TND., Le TL. (2020) Person Search by Queried Description in Vietnamese Natural Language. In: Sitek P., Pietranik M., Krótkiewicz M., Srinilta C. (eds) Intelligent Information and Database Systems. ACIIDS 2020. Communications in Computer and Information Science, vol 1178. Springer, Singapore
  3. Van-Thinh NGUYEN, Thi-Ngoc-Diep DO, Dang-Khoa MAC, Eric CASTELLI, Optimizing data transmission on mobile platform for speech translation system, First Regional Conference on Optical character recognition and Natural language processing technologies for ASEAN languages (ONA 2017), Phnom Penh - Cambodia, 2017.
  4. Viet-Son Nguyen, Trung-Kien Dao, Viet-Tung Nguyen, Do-Dat Tran, Dang-Khoa Mac, DO Thi Ngoc Diep, NGUYEN Tuan Ninh, A Domestic Solution with Vietnamese Speech Interaction, Proceedings of the 9th regional conference on electrical and electronics engineering, pages 258-262, 2016.
  5. DAO Trung Kien, TRAN Thi Thanh Hai, LE Thi Lan, VU Hai, NGUYEN Viet Tung, MAC Dang Khoa, DO Thi Ngoc Diep, PHAM Thanh Thuy, Indoor navigation assistance system for visually impaired people using multimodal technologies, The 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), IEEE issn: 978-1-5090-4757- 4, November 2016.
  6. Thi-Ngoc-Diep DO, Masao UTIYAMA, Eiichiro SUMITA, Machine Translation from Japanese and French to Vietnamese, the Difference among Language Families, In Proceedings of 2015 International Conference on Asian Language Processing, pages 17-20, Suzhou – China, 2015.
  7. Do Thi-Ngoc-Diep, Michaud Alexis, Castelli Eric. Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a ‘light’ acoustic model of the target language and testing ‘heavyweight’ models from five national languages, In proceedings of the 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU'14), ISBN 978-5-8088-0908-6, Publ: GUAP, pages 153-160, 2014.
  8. Thi Ngoc Diep DO, Duy Binh NGUYEN, Dang Khoa MAC, Do Dat TRAN. Machine Translation Approach for Vietnamese Diacritic Restoration. 2013 International Conference on Asian Language Processing (IALP 2013), ISBN: 9781479910960, Publ: Institute of Electrical and Electronics Engineers (IEEE), pages 103-106, 2013.
  9. Emmanuel FERREIRA, Pascal NOCERA, Maria GOUDI and Thi-Ngoc-Diep DO. YAST: A scalable ASR toolkit especially designed for under-resourced languages, 2012 International Conference on Asian Language Processing (IALP 2012), ISBN: 9781467361132, Publ: Institute of Electrical and Electronics Engineers (IEEE), pages 141-144, 2012.
  10. Thi-Ngoc-Diep Do, Eric Castelli, Laurent Besacier, Mining Parallel Data from Comparable Corpora via Triangulation, 2011 International Conference on Asian Language Processing (IALP 2011), ISBN:9781457717338, Publ: Institute of Electrical and Electronics Engineers (IEEE), pages 185-188, 2011.
  11. Laurent Besacier, Haithem Afli, Do Thi Ngoc Diep, Hervé Blanchon, Marion Potet, LIG Statistical Machine Translation Systems for IWSLT 2010, In Proceedings of the International Workshop on Spoken Language Translation, pp.99-104, Paris – France, 2010.
  12. Thi-Ngoc-Diep Do, Laurent Besacier, Eric Castelli, Improved Vietnamese-French Parallel Corpus Mining Using English Language, In Proceedings of the International Workshop on Spoken Language Translation, pp.235-242, Paris – France, 2010.
  13. Thi-Ngoc-Diep Do, Laurent Besacier, Eric Castelli, Apprentissage non supervisé pour la traduction automatique : application à un couple de langues peu doté, Conférence sur le Traitement Automatique des Langues Naturelles, Montréal – Canada, 2010 (paper in French).
  14. Thi-Ngoc-Diep Do, Laurent Besacier, Eric Castelli, A Fully Unsupervised Approach for Mining Parallel Data from Comparable Corpora, In Proceedings of the 14th Annual Conference of the European Association for Machine Translation, Saint-Raphaël – France, 2010.
  15. Thi-Ngoc-Diep Do, Laurent Besacier, Eric Castelli, Unsupervised SMT for a low-resourced language pair, In Proceedings of the 2nd International Workshop on Spoken Languages Technologies for Under-resourced languages, pages 130-135, Penang – Malaysia, 2010.
  16. Thi-Ngoc-Diep Do, Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli. Exploitation d’un corpus bilingue comparable pour la création d’un système de traduction probabiliste Vietnamien – Français, Conférence sur le Traitement Automatique des Langues Naturelles, Senlis – France, 2009 (paper in French).
  17. Thi-Ngoc-Diep Do, Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli, Mining a comparable text corpus for a Vietnamese - French statistical machine translation system, in Proceedings of the 4th EACL Workshop on Statistical Machine Translation, pages 165–172, Athens - Greece, 2009.
  18. Viet-Bac Le, Laurent Besacier, Sopheap Seng, Brigitte Bigi, Thi-Ngoc-Diep Do, Recent Advances in Automatic Speech Recognition for Vietnamese, In Proceedings of the 1er International Workshop on Spoken Languages Technologies for Under-resourced languages, p47-52, Ha Noi - Vietnam, 2008.

Domestic Conference Papers

  1. Tran Thi-Thu-Thuy, Do Thi-Ngoc-Diep, Mac Dang-Khoa, Pham Van-Dong. " Cross-lingual phoneme recognition for familiar languages: Applying to Vietnamese and Muong languages." The 11nd National conference on Fundamental and Applied IT Research (FAIR 2018), Hanoi, Vietnam, August 2018
  2. Do, Thi Ngoc Diep, Dang Khoa Mac and Nguyen Ngoc Bimh. "Applying SCORM 2004 in development of E-learning system." The 2nd National conference on Fundamental and Applied IT Research (FAIR 2005), Ho Chi Minh city, Vietnam, September 2005. (In Vietnamese)

Dissertation

  1. Extraction of parallel corpus for machine translation from/to an under-resourced language,” PhD thesis, Grenoble University, France, December 2011. [Link] (Thesis in French)

Courses

  • Data structure and Algorithms
  • Database
  • Programing technique (in C/C++)
  • Object-oriented system analysis and design
  • Object-oriented Programming (in Java)
  • Machine translation (Master course)
  • Interaction through natural language (Master course)

Students who are interested in one of my research topics can contact me via above email for intern ship or graduation advising

Japanese-Vietnamese multi-level parallel text corpus from Wikipedia data resource

  • Comparable sentence pairs: 103,551 pairs [Link]
  • Parallel Title pairs: 146,146 pairs [Link]
  • Parallel Sentence pairs: 144,612 pairs [Link]
  • Parallel Fragment pairs: 148,224 pairs [Link]

[Last updated: 2020]