Re: Boolean Query Algorithm

From: Cimode <cimode_at_hotmail.com>
Date: 12 Jul 2006 04:41:12 -0700
Message-ID: <1152704472.828437.166880_at_h48g2000cwc.googlegroups.com>


The reason SQL DBMS's are limited is because they don't support efficiently the TEXT data type forcing tons of combinations of resource consuming AND/OR conditions...

You may keep in mind that what makes some technologies more or less efficient in document and text searched is either the logical indexing scheme used OR the physical implementation of the actual document storage.

Hope this helps...

Sherrie Laraurens wrote:
> Hi all,
>
> I have a question relating to how search engines and (in fact
> anything else that supports boolean queries) manage to do such
> things so efficiently.
>
> my question involves a query were you would like to retrieve the all
> the documents in your corpus that have the words "Cat" and "Bat" in
> them and that they not only contain both words but that they must be
> consecutive for example "Bat Cat"
>
> I can think of a very crude way of doing this which involves hashing
> every word in a document into a hash table and storing an index for
> said document , then in the query stage to hash both words (hash
> join) get the intersection vector of the resulting vectors from the
> hashing process, then one by one examine each document from the
> intersection vector find the word "Bat" and see if the word "Cat" is
> the next word if it is place said document into the final result
> vector and once finished pass back to user.
>
> I believe this will work for AND and OR type queries but I can't
> imagine systems like google or yahoo using such a CRUDE method nor
> can i see them caching such results, just because the sheer amount
> of combinations things they would have to be caching.
>
> Does anyone here have any idea on how these things are done? any
> keywords I could use to google etc..
>
>
> Sherrie
Received on Wed Jul 12 2006 - 13:41:12 CEST

Original text of this message