Re: parsing multi-word queries

From: Anil Kumar Tappetla <aktappetla_at_hotmail.com>
Date: 21 Aug 2001 09:56:14 -0700
Message-ID: <4c4fce92.0108210856.4ecb80f7_at_posting.google.com>


well, parsing a query judiciously would definitely enhance relevance. one would never know what the user would have meant by having framed the query the way it is. for eg. one can deduce that two often seen together words are in a way not two dissimilar query terms but rather a single query - take "atomic bomb explosion", if pairs of words "atomic bomb" or "bomb explosion" or "atomic explosion" occur combinedly more often, then probably the query results can come from a different context than from the ones for single independent terms "bomb" "explosion" or "atomic" or any naive combination of these. However, search engine performance can be improved by properly indexing documents. For example if there were a distance measure between key terms that occur in a document, then if the query had zero distance between a pair of terms, the search engine could return all those documents that have a zero distance between these terms with higher relevance. this is not a big leap in search engine performance but is a useful step towards improving the same. if anyone wants to patent this, go ahead and contact me.
:->
akt.

nospam_at_nospam.com (The Critic) wrote in message news:<3b81892d.11708938_at_news.freeserve.net>...
> In the writing of a general purpose (web) search engine, is it better
> to automatically parse queries into single terms and n-word phrases,
> or to get the user to identify phrases with the use of quotes?
>
> E.g. Input = *Japanese railways bullet train*
>
> Should it be parsed into Japanese, railways, bullet, train, Japanese
> railways, railways bullet, bullet train, Japanese railways bullet,
> railways bullet train, Japanese railways bullet train? Or should the
> user be left to identify "bullet train"?
>
> Is there any research into identifying candidate n-word phrases
> (without having to test every candidate against the index)?
>
> Despite the insidious efforts of Satan's forces,
> Truth will ultimately prevail victorious.
>
> The TRUTH is now online at:
> http://www.fortunecity.com/meltingpot/samoa/1382/index.html
Received on Tue Aug 21 2001 - 18:56:14 CEST

Original text of this message