Re: parsing multi-word queries

From: Theo Peterbroers <peterbroers_at_floron.leidenuniv.nl>
Date: 22 Aug 2001 04:27:27 -0700
Message-ID: <39bb2c10.0108220327.27b37584_at_posting.google.com>


aktappetla_at_hotmail.com (Anil Kumar Tappetla) wrote in message news:<4c4fce92.0108210856.4ecb80f7_at_posting.google.com>...
> well, parsing a query judiciously would definitely enhance relevance.
> one would never know what the user would have meant by having framed
> the query the way it is. for eg. one can deduce that two often seen
> together words are in a way not two dissimilar query terms but rather
> a single query - take "atomic bomb explosion", if pairs of words
> "atomic bomb" or "bomb explosion" or "atomic explosion" occur
> combinedly more often, then probably the query results can come from a
> different context than from the ones for single independent terms
> "bomb" "explosion" or "atomic" or any naive combination of these.
> However, search engine performance can be improved by properly
> indexing documents. For example if there were a distance measure
> between key terms that occur in a document, then if the query had zero
> distance between a pair of terms, the search engine could return all
> those documents that have a zero distance between these terms with
> higher relevance. this is not a big leap in search engine performance
> but is a useful step towards improving the same. if anyone wants to
> patent this, go ahead and contact me.
> :->
> akt.
>
> nospam_at_nospam.com (The Critic) wrote in message news:<3b81892d.11708938_at_news.freeserve.net>...
> > In the writing of a general purpose (web) search engine, is it better
> > to automatically parse queries into single terms and n-word phrases,
> > or to get the user to identify phrases with the use of quotes?
> >
> > E.g. Input = *Japanese railways bullet train*
> >
> > Should it be parsed into Japanese, railways, bullet, train, Japanese
> > railways, railways bullet, bullet train, Japanese railways bullet,
> > railways bullet train, Japanese railways bullet train? Or should the
> > user be left to identify "bullet train"?
> >
> > Is there any research into identifying candidate n-word phrases
> > (without having to test every candidate against the index)?

Search for "vector retrieval"
sample result with Google:
http://www.acm.org/pubs/articles/proceedings/ir/312624/p309-turpin/p309-turpin.pdf Received on Wed Aug 22 2001 - 13:27:27 CEST

Original text of this message