Re: Oracle text question - marking-up HTML documents

From: jeremy <jeremy0505_at_gmail.com>
Date: Wed, 28 May 2008 08:30:47 -0700 (PDT)
Message-ID: <debb77d5-7ee5-42b5-8cd2-2481b5c3e253@z66g2000hsc.googlegroups.com>


On May 28, 3:24 pm, "Vladimir M. Zakharychev" <vladimir.zakharyc..._at_gmail.com> wrote:
> On May 28, 5:25 pm, jeremy <jeremy0..._at_gmail.com> wrote:
>
>
>
> > Hi
>
> > This is a problem on 10gR2 (Windows 2000 server).
>
> > We have a table with a CLOB containing HTML documents.
>
> > We have created a text inded on it:
>
> > create index my_text_index
> > on my_tab
> > (
> > html_document
> > )
> > indextype is ctxsys.context
> > parallel (degree 1)
> > parameters ('filter ctxsys.null_filter section group
> > ctxsys.html_section_group');
>
> > Text searches work as expected but when displaying the marked-up
> > result document we are getting the wrong terms highlighted. For
> > example using this:
>
> > ctx_doc.markup(index_name=>'my_text_index',
> > textkey =>to_char(id),
> > text_query=>'duration',
> > restab =>mklob,
> > tagset => 'HTML_DEFAULT',
> > starttag =>'<font size=+1 color="red"> <b>',
> > endtag =>'</b></font>');
>
> > Results in:
>
> > ..
> > ..
> > <TD valign=top class=icams-field-prompt>What was the primary
> > business<font size=+1 color="red"> <b> of your</b></font> employer?</
> > TD>
>
> > <TD valign=top class=field-text>99</TD>
> > </TR>
> > <TR>
> > <TD valign=top class=field-prompt>Duration of employment (months)</TD>
> > <TD valign=top class=field-text>3</TD>
> > ..
> > ..
> > You will see that CTX_DOC.MARKUP has highlighted the words "of your"
> > as opposed to the word "duration".
>
> > If I simply change the call to CTX_DOC.MARKUP to use the additonal
> > parameter
> > plaintext => false
> > then the markup is correct (though of course without any formatting).
>
> > Has anyone come across this behaviour before and what are we doing
> > wrong?
>
> > many thanks
> > --
> > jeremy
>
> Did you try POLICY_MARKUP() on this document? POLICY_MARKUP() doesn't
> require a CONTEXT index, so you can easily test if this is a markup
> glitch or a problem with your current Text index. If POLICY_MARKUP()
> will correctly markup the document, you may have to rebuild the index.
> See the Text Reference book for details on policy-based filtering and
> markup. If POLICY_MARKUP() will markup incorrectly, too, and your
> database is not at 10.2.0.4 you may want to try patching it and
> checking if this behavior persists in the latest patch set. If you run
> 10.2.0.4 already and you have a reproducible test case for this markup
> behavior, your only option is to create a SR with Oracle Support and
> request a fix.

Thanks for your comments Vladimir - at least it confirms that we are not doing something wrong.

The dev database is currently 10.2.0.1.0 so an upgrade is in order.

thanks

--
jeremy
Received on Wed May 28 2008 - 10:30:47 CDT

Original text of this message