Re: parsing multi-word queries

From: Anil Kumar Tappetla <aktappetla_at_hotmail.com>
Date: 22 Aug 2001 11:56:05 -0700
Message-ID: <4c4fce92.0108221056.36fdc8e6_at_posting.google.com>


vector retrieval is altogether at a different level. it only measures the similarity between documents and the user query. however, framing the query vector accurately determines the efficacy of the retrieval system. to frame a proper query vector, ideally it would be to ask the user what exactly he is looking for ! statistically, if majority of users think alike while framing a query when they are looking for something, then one can use this information to construct a better query vector. however, what i was elaborating in my previous message is about finding relations among query/index terms. these relations can be stored along with the document index and used for computing the relevance. any takers ?
anil.

peterbroers_at_floron.leidenuniv.nl (Theo Peterbroers) wrote in message news:<39bb2c10.0108220327.27b37584_at_posting.google.com>...
> aktappetla_at_hotmail.com (Anil Kumar Tappetla) wrote in message news:<4c4fce92.0108210856.4ecb80f7_at_posting.google.com>...
> > well, parsing a query judiciously would definitely enhance relevance.
> > one would never know what the user would have meant by having framed
> > the query the way it is. for eg. one can deduce that two often seen
> > together words are in a way not two dissimilar query terms but rather
> > a single query - take "atomic bomb explosion", if pairs of words
> > "atomic bomb" or "bomb explosion" or "atomic explosion" occur
> > combinedly more often, then probably the query results can come from a
> > different context than from the ones for single independent terms
> > "bomb" "explosion" or "atomic" or any naive combination of these.
> > However, search engine performance can be improved by properly
> > indexing documents. For example if there were a distance measure
> > between key terms that occur in a document, then if the query had zero
> > distance between a pair of terms, the search engine could return all
> > those documents that have a zero distance between these terms with
> > higher relevance. this is not a big leap in search engine performance
> > but is a useful step towards improving the same. if anyone wants to
> > patent this, go ahead and contact me.
> > :->
> > akt.
> >
> > nospam_at_nospam.com (The Critic) wrote in message news:<3b81892d.11708938_at_news.freeserve.net>...
> > > In the writing of a general purpose (web) search engine, is it better
> > > to automatically parse queries into single terms and n-word phrases,
> > > or to get the user to identify phrases with the use of quotes?
> > >
> > > E.g. Input = *Japanese railways bullet train*
> > >
> > > Should it be parsed into Japanese, railways, bullet, train, Japanese
> > > railways, railways bullet, bullet train, Japanese railways bullet,
> > > railways bullet train, Japanese railways bullet train? Or should the
> > > user be left to identify "bullet train"?
> > >
> > > Is there any research into identifying candidate n-word phrases
> > > (without having to test every candidate against the index)?
>
> Search for "vector retrieval"
> sample result with Google:
> http://www.acm.org/pubs/articles/proceedings/ir/312624/p309-turpin/p309-turpin.pdf
Received on Wed Aug 22 2001 - 20:56:05 CEST

Original text of this message