IMAGECLEF: EXPERIMENTAL EVALUATION IN VISUAL INFORMATION RETRIEVAL, cilt.32, ss.261-275, 2010 (SCI-Expanded)
In this chapter, we present an approach to handle multi-modality in image retrieval using a Vector Space Model (VSM), which is extensively used in text retrieval. We simply extended the model with visual terms aiming to close the semantic gap by helping to map low-level features into high level textual semantic concepts. Moreover, this combination of textual and visual modality into one space also helps to query a textual database with visual content, or a visual database with textual content. Alongside this, in order to improve the performance of text retrieval we propose a novel expansion and re-ranking method, applied both to the documents and the query. When textual annotations of images are acquired automatically, they may contain too much information, and document expansion adds more noise to retrieval results. We propose a re-ranking phase to discard such noisy terms. The approaches introduced in this chapter were evaluated in two sub-tasks of ImageCLEF2009. First, we tested the multi-modality part in ImageCLEFmed and obtained the best rank in mixed retrieval, which includes textual and visual modalities. Secondly, we tested expansion and re-ranking methods in ImageCLE-FWiki and the results were superior to others and obtained the best four positions in text-only retrieval. The results showed that the handling of multi-modality in text retrieval using a VSM is promising, and document expansion and re-ranking plays an important role in text based image retrieval.