Online Suffix Tree Construction for Streaming Sequences


ÖZCAN G., ALPKOÇAK A.

13th International-Computer-Society-of-Iran-Computer Conference, Kish Isl, İran, 9 - 11 Mart 2008, cilt.6, ss.69-81 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 6
  • Doi Numarası: 10.1007/978-3-540-89985-3_9
  • Basıldığı Şehir: Kish Isl
  • Basıldığı Ülke: İran
  • Sayfa Sayıları: ss.69-81
  • Anahtar Kelimeler: Suffix trees, sequence databases, time series indexing, poor memory locality
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In this study, we present an online suffix tree construction approach where Multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a space efficient node representation approach to be used in Ukkonen suffix tree construction algorithm. Next, we show that performance can be increased through incorporating semantic knowledge such as utilizing the frequently used letters of an alphabet. In particular, we estimate the frequently accessed nodes of the tree and introduce a sequence insertion strategy into the tree. As a result, we can speed up accessing to the frequently accessed nodes, Finally, we analyze the contribution of buffering strategies and page sizes on performance and perform detailed tests. We run a series of experimentation under various buffering strategies and page sizes. Experimental results showed that our approach outperforms existing ones.