Re: parsing multi-word queries
Date: 21 Aug 2001 09:56:14 -0700
Message-ID: <4c4fce92.0108210856.4ecb80f7_at_posting.google.com>
well, parsing a query judiciously would definitely enhance relevance.
one would never know what the user would have meant by having framed
the query the way it is. for eg. one can deduce that two often seen
together words are in a way not two dissimilar query terms but rather
a single query - take "atomic bomb explosion", if pairs of words
"atomic bomb" or "bomb explosion" or "atomic explosion" occur
combinedly more often, then probably the query results can come from a
different context than from the ones for single independent terms
"bomb" "explosion" or "atomic" or any naive combination of these.
However, search engine performance can be improved by properly
indexing documents. For example if there were a distance measure
between key terms that occur in a document, then if the query had zero
distance between a pair of terms, the search engine could return all
those documents that have a zero distance between these terms with
higher relevance. this is not a big leap in search engine performance
but is a useful step towards improving the same. if anyone wants to
patent this, go ahead and contact me.
nospam_at_nospam.com (The Critic) wrote in message news:<3b81892d.11708938_at_news.freeserve.net>...
:->
akt.
> In the writing of a general purpose (web) search engine, is it better
> to automatically parse queries into single terms and n-word phrases,
> or to get the user to identify phrases with the use of quotes?
>
> E.g. Input = *Japanese railways bullet train*
>
> Should it be parsed into Japanese, railways, bullet, train, Japanese
> railways, railways bullet, bullet train, Japanese railways bullet,
> railways bullet train, Japanese railways bullet train? Or should the
> user be left to identify "bullet train"?
>
> Is there any research into identifying candidate n-word phrases
> (without having to test every candidate against the index)?
>
> Despite the insidious efforts of Satan's forces,
> Truth will ultimately prevail victorious.
>
> The TRUTH is now online at:
> http://www.fortunecity.com/meltingpot/samoa/1382/index.html
Received on Tue Aug 21 2001 - 18:56:14 CEST