Turkish word error detection using syllable bigram statistics


Gunel K., Asliyan R.

IEEE 14th Signal Processing and Communications Applications, Antalya, Türkiye, 16 - 19 Nisan 2006, ss.631-632 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/siu.2006.1659786
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.631-632
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In this study, we have designed and implemented a system, which uses n-gram statistical language model in order to facilitate Optical Character Recognition, Speech Synthesis and Recognition systems. First, the syllables bigram frequencies are extracted from Turkish corpora. Then, the test database including the words, which are written correctly and wrongly, is created. The probability of the words appears the given text is calculated and the wrongly and, correctly written words are determined. The system finds the wrongly written words about 86.13% with the proposed approach and the correctly written words are found about 88.32%.