On the cryptographic patterns and frequencies in Turkish language


Dalkiliçi M. E., DALKILIÇ G.

2nd International Conference on Advances in Information Systems, ADVIS 2002, İzmir, Türkiye, 23 - 25 Ekim 2002, cilt.2457, ss.144-153, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 2457
  • Basıldığı Şehir: İzmir
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.144-153
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Although Turkish is a significant language with over 60 million native speakers, its cryptographic characteristics are relatively unknown. In this paper, some language patterns and frequencies of Turkish (such as letter frequency profile, letter contact patterns, most frequent digrams, trigrams and words, common word beginnings and endings, vowel/consonant patterns, etc.) relevant to information security, cryptography and plaintext recognition applications are presented and discussed. The data is collected from a large Turkish corpus and the usage of the data is illustrated through cryptanalysis of a monoalphabetic substitution cipher. A new vowel identification method is developed using a distinct pattern of Turkish—(almost) non-existence of double consonants at word boundaries.