Re: CONTEXT index and word importance
Date: Mon, 21 Apr 2008 19:46:37 -0700 (PDT)
On Apr 13, 9:12 am, "www.douglassdavis.com" <douglass_da..._at_earthlink.net> wrote:
> On Apr 11, 12:30 pm, BicycleRepairman <engel.ke..._at_gmail.com> wrote:
> > Upgrade to 11g and use the new user-defined scoring capability.
> > See http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/cqoper.h...
> Thanks. I'm not sure if this does what I want though.
> What I would really like to do is mark some parts of a document as
> more relevant than others as soon as the document is indexed. For
> example, the tile is more relevant than other text.
Sorry -- missed your reply... I see what you mean; the custom scoring really focuses on weighing the importance of words in a particular query, not where they are found in the document. Unfortunately, trying to jigger the weights at index time is probably not what you really want to do either. Recall that the index is basically a list of docids for a given token. That list is not sorted by default, and optimizing it doesn't do what you want either. You have a couple of possible paths that I can think of... You want to think of this in terms of weighing the words in a query rather than manipulating the index. One way to do that would be to have two Context indexes, (i.e. one for title and the other for the body text). You might then accept a user query for 'dog' and expand it to
contains (title, 'dog', 1)>0 or contains (bodytext,
and from there
select score(1)*3 + score(2) as "Weighted Text Score"
from ... order by "Weighted Text Score"
which would have the effect of scoring documents that have dog in the title much higher than those that have dog in the body.
If the number of potential columns is more than a couple, I'd build a concatenated index on title + body + [whatever else]. You might then accept a user query for 'dog' and expand it to
contains(rowtext, 'dog*3 within TITLE or dog*2 within BODY or dog within comments',1)>0 Received on Mon Apr 21 2008 - 21:46:37 CDT