Turkish Spelling Error Detection and Correction by Using Word N-grams


Dalkılıç G., Çebi Y.

5th International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, Famagusta, CYPRUS, 2 - 04 Eylül 2009, ss.63-66 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Basıldığı Şehir: Famagusta
  • Basıldığı Ülke: CYPRUS
  • Sayfa Sayıları: ss.63-66
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

N-grams can be used for spelling check and correction processes. The first step to use n-grams is to find the language specific n-grams by using a corpus. But a corpus cannot be big enough to contain all the possible word n-grams. Back-off smoothing technique is one of the techniques to estimate the frequency of the unknown n-grams in a corpus. By using Back-off technique and the Minimum Edit Distance (MED) algorithm, a program was developed to check spelling errors and suggest corrections in a sentence typed in Turkish. The results were compared with the results of Microsoft Word 2003 proofing tools, and found to be much better.