Intersection of two index structures

From: Kai Großjohann <Kai.Grossjohann_at_CS.Uni-Dortmund.DE>
Date: Sat, 22 Sep 2001 00:26:30 +0200
Message-ID: <vafitec89k9.fsf_at_lucy.cs.uni-dortmund.de>



Maybe what I'm looking for is multidimensional index structures. But I'm not sure, so I'll try to describe in my own terms what I'm after.

The question is about storing XML data. Let us talk about the simple case where the XML data is just text.

From Information Retrieval we know that a good index for text is the inverted file. You just split the documents in words and then for each word you create an inverted list which contains a list of document numbers containing that word.

Good.

But for XML data just document numbers is not enough: you need to store information about the node in the XML document tree the element came from.

And then you have queries which specify both the word to search for and some restrictions on the path from the root node in the XML document tree to the node containing the word.

And if you used the simple approach with inverted files, then you would fetch the whole inverted list for each word in the query, and then you'd iterate over the whole list, checking each posting against the path condition.

Is there a way to avoid retrieving so many postings from the inverted list? Is there an index structure which helps here? Where I don't have to retrieve so many items from disk?

kai

-- 
Symbol's function definition is void: signature
Received on Sat Sep 22 2001 - 00:26:30 CEST

Original text of this message