IEEE 12th Signal Processing and Communications Applications Conference, Kusadasi, Turkey, 28 - 30 April 2004, pp.272-275
Text-to-Speech (TTS) synthesis can be regarded as the automatic transformation of sentences from their text form into their speech waveform by machines. The most crucial problem confronting TTS systems is the generation of natural sounding voice. In order to obtain natural sounding synthetic speech, prosodic attributes of speech such as pitch frequency, duration and intensity should be modelled appropriately. This paper summarizes the efforts to obtain duration models to be utilized in Turkish TTS systems via machine-learning algorithms.