On the cryptographic patterns and frequencies in Turkish language


Dalkilic M., Dalkilic G.

ADVANCES IN INFORMATION SYSTEMS, cilt.2457, ss.144-153, 2002 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 2457
  • Basım Tarihi: 2002
  • Dergi Adı: ADVANCES IN INFORMATION SYSTEMS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED)
  • Sayfa Sayıları: ss.144-153
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Although Turkish is a significant language with over 60 million native speakers, its cryptographic characteristics are relatively unknown. In this paper, some language patterns and frequencies of Turkish (such as letter frequency profile, letter contact patterns, most frequent digrams, trigrams and words, common word beginnings and endings, vowel/consonant patterns, etc.) relevant to information security, cryptography and plaintext recognition applications are presented and discussed. The data is collected from a large Turkish corpus and the usage of the data is illustrated through cryptanalysis of a mono-alphabetic substitution cipher. A new vowel identification method is developed using a distinct pattern of Turkish-(almost) non-existence of double consonants at word boundaries.