Re: Boolean Query Algorithm

From: Cimode <cimode_at_hotmail.com>
Date: 13 Jul 2006 01:20:10 -0700
Message-ID: <1152778810.179857.174390_at_p79g2000cwp.googlegroups.com>


Check out and read the link have provided...You may download a trial version of the software at

http://www.armadillo.fr/english/armamailpro.html#doc

Try and see if it can at least guide your search...

Hope this helps...

Sherrie Laraurens wrote:
> what type of indexing would be used for the final filtering process?
>
> Marshall wrote:
> > Sherrie Laraurens wrote:
> > >
> > > I have a question relating to how search engines and (in fact
> > > anything else that supports boolean queries) manage to do such
> > > things so efficiently.
> > >
> > > my question involves a query were you would like to retrieve the all
> > > the documents in your corpus that have the words "Cat" and "Bat" in
> > > them and that they not only contain both words but that they must be
> > > consecutive for example "Bat Cat"
> > >
> > > I can think of a very crude way of doing this which involves hashing
> > > every word in a document into a hash table and storing an index for
> > > said document , then in the query stage to hash both words (hash
> > > join) get the intersection vector of the resulting vectors from the
> > > hashing process, then one by one examine each document from the
> > > intersection vector find the word "Bat" and see if the word "Cat" is
> > > the next word if it is place said document into the final result
> > > vector and once finished pass back to user.
> > >
> > > I believe this will work for AND and OR type queries but I can't
> > > imagine systems like google or yahoo using such a CRUDE method nor
> > > can i see them caching such results, just because the sheer amount
> > > of combinations things they would have to be caching.
> > >
> > > Does anyone here have any idea on how these things are done? any
> > > keywords I could use to google etc..
> >
> > I think you've descrbed it pretty accurately. Web search engines
> > are essentially ginormous inverted indexes. Note that the final
> > stage, searching specifically for BAT and CAT can also benefit
> > from indexing.
> >
> >
> > Marshall
Received on Thu Jul 13 2006 - 10:20:10 CEST

Original text of this message