ADVANCES IN INFORMATION SYSTEMS, vol.2457, pp.144-153, 2002 (SCI-Expanded)
Although Turkish is a significant language with over 60 million native speakers, its cryptographic characteristics are relatively unknown. In this paper, some language patterns and frequencies of Turkish (such as letter frequency profile, letter contact patterns, most frequent digrams, trigrams and words, common word beginnings and endings, vowel/consonant patterns, etc.) relevant to information security, cryptography and plaintext recognition applications are presented and discussed. The data is collected from a large Turkish corpus and the usage of the data is illustrated through cryptanalysis of a mono-alphabetic substitution cipher. A new vowel identification method is developed using a distinct pattern of Turkish-(almost) non-existence of double consonants at word boundaries.