Re: Complex CONTEXT index

From: Nigel Thomas <>
Date: Fri, 23 Jan 2009 16:13:32 +0000
Message-ID: <>


Don't you need to translate the BLOB content into indexable text before you index it? A simple transliteration of hex values is no help; you need something that would convert the enclosed encoded Word or PDF into real words.

  • PDF to text - there are some solutions out there (eg PDFbox<>- an OSS java toolkit; found by Google, no idea if it really works).
  • Word to text - you could try eg Apache POI <>(same reservations, and looks like old Word formats may be poorly served). Obviously this will be much easier once you get to Office Open XML file formats - you can just take the XML and dump the text without markup into your CLOB.
  • In both cases, you'd build a BLOB-to-CLOB converter using a Java stored proc.

Once you have indexed the text representation, you can of course discard it (or save some/all of it for preview purposes...)

Regards Nigel

2009/1/23 Bill Zakrzewski <>

> Listers -
> Oracle
> RH Linux
> I have a table (see below) that I would like to create a Context/Intermedia
> index on the title, short_desc, long_desc and the document (BLOB column). I
> have created a similar index on a different table that contained a CLOB by
> concatenating all of the fields into a single CLOB and creating the CONTEXT
> index using the pl/sql package/procedure (see below). I would like to do
> the same thing using the BLOB column, but not sure what values to use in the
> parameters for the DBMS_LOB.CONVERTTOCLOB procedure, specifically the
> BLOB_CSID and LANG_CONTEXT. My concern is the defaults will cause it to
> copy the data in binary format and not convert correctly, as the document
> may be a PDF or WORD Document or Excel Spreadsheet, etc. Thanks in advance
> for your help.

Received on Fri Jan 23 2009 - 10:13:32 CST

Original text of this message