Re: Word-Level Inverted File Structure

From: Pete Nayler <nayler_at_dingoblue.net.au>
Date: Thu, 12 Oct 2000 11:41:31 +0800
Message-ID: <39e53208$0$27119$7f31c96c_at_news01.syd.optusnet.com.au>


"Jan Hidders" <hidders_at_REMOVE.THIS.win.tue.nl> wrote in message news:8s1gs2$mcu$1_at_news.tue.nl...
> Pete Nayler wrote:
> >
> > "Jan Hidders" <hidders_at_REMOVE.THIS.win.tue.nl> wrote in message
> > news:8s19pp$knf$1_at_news.tue.nl...
> > > Pete Nayler wrote:
> > > > The structure I'm referring to is explained in Witten et al
 "Managing
> > > > Gigabytes", where each word in an inverted file is referenced using:
> > > >
> > > > <2;(1;6,9),(4;8)>
> > > >
> > > > where the (bracketed) terms can be expressed as
> > > >
> > > > (x ; y1, y2, y3, ...)
> > > >
> > > > where x represents the document in which the word exists, and y
> > > > represents the word position in the document.
> > > >
> > > > The question is, what does the first term in the full structure
> > > > represent?
> > >
> > > I'm totally guessing here, but could it be the word for which the
> > > positions are indicated?
> >
> > Thanks for the reply, but in the book, it gives an example of indexing
> > using a series of documents, giving the word listing as follows:
> >
> > cold - <2;(1;6),(4;8)>
> > hot - <2;(3;2),(6;2)>
> > warm - <2;(1;3),(4;4)>
> > etc...
> >
> > As you can see, the first term is always "2", which preceeds the
 document
> > and then the position. Puzzling...
 

> Ok. Let's try another guess: the number of documents that the word occurs
> in?

Hmmmm - you may have it there. But one question.... what purpose would that serve? It wouldnt really help in relevance ranking or sorting. Seems to me a bizarre thing to have in a string that contains more detailed information about the word.

Pete Received on Thu Oct 12 2000 - 05:41:31 CEST

Original text of this message