Online Suffix Tree Construction for Streaming Sequences


ÖZCAN G., ALPKOÇAK A.

13th International-Computer-Society-of-Iran-Computer Conference, Kish Isl, Iran, 9 - 11 March 2008, vol.6, pp.69-81 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 6
  • Doi Number: 10.1007/978-3-540-89985-3_9
  • City: Kish Isl
  • Country: Iran
  • Page Numbers: pp.69-81
  • Keywords: Suffix trees, sequence databases, time series indexing, poor memory locality
  • Dokuz Eylül University Affiliated: Yes

Abstract

In this study, we present an online suffix tree construction approach where Multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a space efficient node representation approach to be used in Ukkonen suffix tree construction algorithm. Next, we show that performance can be increased through incorporating semantic knowledge such as utilizing the frequently used letters of an alphabet. In particular, we estimate the frequently accessed nodes of the tree and introduce a sequence insertion strategy into the tree. As a result, we can speed up accessing to the frequently accessed nodes, Finally, we analyze the contribution of buffering strategies and page sizes on performance and perform detailed tests. We run a series of experimentation under various buffering strategies and page sizes. Experimental results showed that our approach outperforms existing ones.