Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models

Full Text (PDF, 351KB), PP.17-23

Views: 0 Downloads: 0

Author(s)

Ke Yan 1,* Shu Gong 1

1. USTC iFlytek Speech Laboratory, University of Science and Technology of China, Hefei, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2011.02.03

Received: 15 Jun. 2010 / Revised: 25 Sep. 2010 / Accepted: 1 Jan. 2011 / Published: 8 Mar. 2011

Index Terms

Computer assisted language learning, MPE, MWE, posterior probability, PSC, discriminative training

Abstract

The popular MLE (Maximum Likelihood Estimation) is a generative approach for acoustic modeling and ignores the information of other phones during training stage. Therefore, the MLE-trained acoustic models are confusable and unable to distinguish confusing phones well. This paper introduces discriminative measures of minimum phone/word error (MPE/MWE) to refine acoustic models to deal with the problem. Experiments on the database of 498 people’s live Putonghua test indicate that: 1) Refined acoustic models are more distinguishable than conventional MLE ones; 2) Even though training and test are mismatch, they still perform significantly better than MLE ones in pronunciation proficiency evaluation. The final performance has approximately 4.5% relative improvement.

Cite This Paper

Ke Yan, Shu Gong, "Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models", International Journal of Information Technology and Computer Science(IJITCS), vol.3, no.2, pp.17-23, 2011. DOI: 10.5815/ijitcs.2011.02.03

Reference

[1] Si Wei, Yu Hu, Renhua Wang, “The Electronic PSC Testing System”, Journal of Chinese Information Processing, Vol 20, No.6, Jun 2006, pp.89-96 (in Chinese)

[2] Qingsheng Liu, Si Wei, Yu Hu, Renhua Wang, “The Linguistic Knowledge Based Improvement in Automatic Putonghua Pronunciation Quality Assessment Algorithm”, Journal of Chinese Information Processing, Vol 21, No.4, July 2007, pp.92-96 (in Chinese)

[3] Si Wei, et al. Putonghua Proficiency Test and Evaluation, Advances in Chinese Spoken Language Processing, Chapter 18: Springer Press, 2006

[4] H.L Franco, L.Neumeyer, Y.Kim, O.Ronen. “Automatic pronunciation scoring for language instruction”, ICASSP 1997, pp 1465-146.8

[5] L. Neumeyer, H. Franco, V. Digalakis, M.Weintraub. “Automatic Scoring of Pronunciation Quality”. Speech Communication 30, 2000, pp 83-93.

[6] L. Neumeyer, H. Franco, V. Digalakis, M.Weintraub. “Automatic Scoring of Pronunciation Quality”. Speech Communication 30, 2000, pp 83-93.

[7] C. Cucchiarini, F.D.Wet, H.Strik, L.Boves, “Automatic Evaluation of Dutch Pronunciation by Using Speech Recognition Technology”, ICSLP Vol.5, 1998, 1739-1742.

[8] S.M Witt, “Use of speech recognition in computer assisted language learning”, A dissertation for doctor’s degree of Cambridge, Nov 1999

[9] S.M Witt, S,J.Young, “Phone-level pronunciation scoring and assessment for interactive language learning”, Speech Communication 30, 2000, 95-108.

[10] Bahl L R, Brown P F, Souza P V, et al, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition”. Proceedings of ICASSP1986, 1986. 49-52

[11] Valtchev V, Odell J, Woodland P, et al. “Lattice-Based Discriminative Training for Large Vocabulary Speech Recognition”. Proceedings of ICASSP1996, 1996. Vol2,605-608

[12] Valtchev V, Odell J, Woodland P, et al. “MMIE Training of Large Vocabulary Recognition Systems”, Speech Communication, 1997. 22(4): 303-314.

[13] D. Provey and P. Woodland, “Minimum Phone Error and I-Smoothing for Improved Discriminative Training”, Proceedings of ICASSP 2002, pp105-108.

[14] Feng Zhang, “A Research on Automatic Error Detection Based on Statistical Pattern Recognition”, A dissertation for doctor’s degree at USTC, May 2009 (in Chinese)

[15] Xiaojun Qian, Frank Soong, Helen Meng, “Discriminative Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer-Aided Pronunciation Training(CAPT)”, Interspeech 2010, Sep 2010.

[16] Putonghua training and testing center, "the Outline for Putonghua proficiency test and evaluation", Commercial Press, 2004 (in Chinese)

[17] Si Wei, “Automatic Error Detection Based on Statistical Pattern Recognition”, A dissertation for doctor’s degree of USTC, Apr. 2008 (in Chinese)

[18] Ke Yan, “Pronunciation Quality Assessment based on Phone Scoring Model”, Journal of Chinese Information Processing, accepted, (in Chinese)

[19] www.isay365.com

[20] Ke Yan, “Research on Automatic Evaluation of English Recitation and Retelling Test”, A dissertation for master’s degree at USTC, May 23rd. 2008, (in Chinese)

[21] Ke Yan, Guoping Hu, Si Wei, Lirong Dai et al, “Automatic Evaluation of English Retelling Proficiency for Large Scale Machine Examinations of Oral English Test”, Academy Journal of TsingHua Univerisity (Nature Science Edition), 2009 S1. pp1356-1362 (in Chinese)

[22] Chiharu Tsurutani, “Foreign Accent Matters Most When Timing is Wrong”, Interspeech 2010, pp1854-1857

[23] Peng Liu, Frank K. Soong, “Kullback-Leibler Divergence between Two Hidden Markov Models”, Microsoft Research Asia, Speech Group, unpublished

[24] Ke Yan, “Evaluation Oriented Acoustic Models Training for Computer Assisted Language Learning Systems”, SMSEM 2011,April, 2011 (in Chinese)

[25] Shu Gong, “the Implementation of Discriminative Training in Pronunciation Proficiency Evaluation based on TANDEM”, A dissertation to master’s degree at USTC, May 2010. (in Chinese)