Segmental duration modelling in Turkish


ÖZTÜRK Ö., ÇİLOĞLU T.

9th International Conference on Text, Speech and Dialogue, TSD 2006, Brno, Çek Cumhuriyeti, 11 - 15 Eylül 2006, ss.669-676 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1007/11846406_84
  • Basıldığı Şehir: Brno
  • Basıldığı Ülke: Çek Cumhuriyeti
  • Sayfa Sayıları: ss.669-676
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Naturalness of synthetic speech highly depends on appropriate modelling of prosodic aspects. Mostly, three prosody components are modelled: segmental duration, pitch contour and intensity. In this study, we present our work on modelling segmental duration in Turkish using machine-learning algorithms, especially Classification and Regression Trees. The models predict phone durations based on attributes such as current, preceding and following phones' identities, stress, part-of-speech, word length in number of syllables, and position of word in utterance extracted from a speech corpus. Obtained models predict segment durations better than mean duration approximations (∼0.77 Correlation Coefficient, and 20.4 ms Root-Mean Squared Error). In order to improve prediction performance further, attributes used to develop segmental duration are optimized by means of Sequential Forward Selection method. As a result of Sequential Forward Selection method, phone identity, neighboring phone identities, lexical stress, syllable type, part-of-speech, phrase break information, and location of word in the phrase constitute optimum attribute set for phoneme duration modelling. © Springer-Verlag Berlin Heidelberg 2006.