Oracle text question - marking-up HTML documents

From: jeremy <jeremy0505_at_gmail.com>
Date: Wed, 28 May 2008 06:25:32 -0700 (PDT)
Message-ID: <6485df0f-fee5-4785-b9e4-0d14dd4f569f@x35g2000hsb.googlegroups.com>


Hi

This is a problem on 10gR2 (Windows 2000 server).

We have a table with a CLOB containing HTML documents.

We have created a text inded on it:

create index my_text_index
  on my_tab
  (
   html_document
  )
  indextype is ctxsys.context
  parallel (degree 1)
  parameters ('filter ctxsys.null_filter section group ctxsys.html_section_group');

Text searches work as expected but when displaying the marked-up result document we are getting the wrong terms highlighted. For example using this:

  ctx_doc.markup(index_name=>'my_text_index',
                 textkey   =>to_char(id),
                 text_query=>'duration',
                 restab    =>mklob,
                 tagset    => 'HTML_DEFAULT',
	         starttag  =>'<font size=+1 color="red"> <b>',
		 endtag    =>'</b></font>');

Results in:

..
..
<TD valign=top class=icams-field-prompt>What was the primary
business<font size=+1 color="red"> <b> of your</b></font> employer?</ TD>

<TD valign=top class=field-text>99</TD>
</TR>
<TR>
<TD valign=top class=field-prompt>Duration of employment (months)</TD>
<TD valign=top class=field-text>3</TD>

..
..
You will see that CTX_DOC.MARKUP has highlighted the words "of your" as opposed to the word "duration".

If I simply change the call to CTX_DOC.MARKUP to use the additonal parameter
plaintext => false
then the markup is correct (though of course without any formatting).

Has anyone come across this behaviour before and what are we doing wrong?

many thanks

--
jeremy
Received on Wed May 28 2008 - 08:25:32 CDT

Original text of this message