Oracle text question - marking-up HTML documents
Date: Wed, 28 May 2008 06:25:32 -0700 (PDT)
Message-ID: <6485df0f-fee5-4785-b9e4-0d14dd4f569f@x35g2000hsb.googlegroups.com>
Hi
This is a problem on 10gR2 (Windows 2000 server).
We have a table with a CLOB containing HTML documents.
We have created a text inded on it:
create index my_text_index
on my_tab
(
html_document
)
indextype is ctxsys.context
parallel (degree 1)
parameters ('filter ctxsys.null_filter section group
ctxsys.html_section_group');
Text searches work as expected but when displaying the marked-up result document we are getting the wrong terms highlighted. For example using this:
ctx_doc.markup(index_name=>'my_text_index', textkey =>to_char(id), text_query=>'duration', restab =>mklob, tagset => 'HTML_DEFAULT', starttag =>'<font size=+1 color="red"> <b>', endtag =>'</b></font>');
Results in:
..
..
<TD valign=top class=icams-field-prompt>What was the primary
business<font size=+1 color="red"> <b> of your</b></font> employer?</
TD>
<TD valign=top class=field-text>99</TD>
</TR>
<TR>
<TD valign=top class=field-prompt>Duration of employment (months)</TD>
<TD valign=top class=field-text>3</TD>
..
..
You will see that CTX_DOC.MARKUP has highlighted the words "of your"
as opposed to the word "duration".
If I simply change the call to CTX_DOC.MARKUP to use the additonal
parameter
plaintext => false
then the markup is correct (though of course without any formatting).
Has anyone come across this behaviour before and what are we doing wrong?
many thanks
-- jeremyReceived on Wed May 28 2008 - 08:25:32 CDT