Re: Oracle text question - marking-up HTML documents

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Wed, 28 May 2008 07:24:48 -0700 (PDT)
Message-ID: <a1cb855b-ed14-46da-8946-7df9bbbdca37@c58g2000hsc.googlegroups.com>


On May 28, 5:25 pm, jeremy <jeremy0..._at_gmail.com> wrote:
> Hi
>
> This is a problem on 10gR2 (Windows 2000 server).
>
> We have a table with a CLOB containing HTML documents.
>
> We have created a text inded on it:
>
> create index my_text_index
> on my_tab
> (
> html_document
> )
> indextype is ctxsys.context
> parallel (degree 1)
> parameters ('filter ctxsys.null_filter section group
> ctxsys.html_section_group');
>
> Text searches work as expected but when displaying the marked-up
> result document we are getting the wrong terms highlighted. For
> example using this:
>
> ctx_doc.markup(index_name=>'my_text_index',
> textkey =>to_char(id),
> text_query=>'duration',
> restab =>mklob,
> tagset => 'HTML_DEFAULT',
> starttag =>'<font size=+1 color="red"> <b>',
> endtag =>'</b></font>');
>
> Results in:
>
> ..
> ..
> <TD valign=top class=icams-field-prompt>What was the primary
> business<font size=+1 color="red"> <b> of your</b></font> employer?</
> TD>
>
> <TD valign=top class=field-text>99</TD>
> </TR>
> <TR>
> <TD valign=top class=field-prompt>Duration of employment (months)</TD>
> <TD valign=top class=field-text>3</TD>
> ..
> ..
> You will see that CTX_DOC.MARKUP has highlighted the words "of your"
> as opposed to the word "duration".
>
> If I simply change the call to CTX_DOC.MARKUP to use the additonal
> parameter
> plaintext => false
> then the markup is correct (though of course without any formatting).
>
> Has anyone come across this behaviour before and what are we doing
> wrong?
>
> many thanks
> --
> jeremy

Did you try POLICY_MARKUP() on this document? POLICY_MARKUP() doesn't require a CONTEXT index, so you can easily test if this is a markup glitch or a problem with your current Text index. If POLICY_MARKUP() will correctly markup the document, you may have to rebuild the index. See the Text Reference book for details on policy-based filtering and markup. If POLICY_MARKUP() will markup incorrectly, too, and your database is not at 10.2.0.4 you may want to try patching it and checking if this behavior persists in the latest patch set. If you run 10.2.0.4 already and you have a reproducible test case for this markup behavior, your only option is to create a SR with Oracle Support and request a fix.

Hth,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Wed May 28 2008 - 09:24:48 CDT

Original text of this message