Re: Oracle Text Indexes on WORD and pdf files

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Sun, 9 May 2010 10:42:30 -0700 (PDT)
Message-ID: <abab9c84-8756-4de5-9541-dcf773315237_at_i9g2000yqi.googlegroups.com>



On May 9, 7:14 am, zigzagdna <zigzag..._at_yahoo.com> wrote:
> I am on Oracle 11 g using hp unix 11i.
> Can one set Oracle text Indexes on Microsoft WORD and  pdf files. I
> can understand text indexes on text files, but how do test indexes
> work on “binary” files such as WORD and pdf files. I have been told
> they work on these files as well, just curious how oracle manages to
> parse such files.

They are using external converters to parse them and convert them to HTML or XML. Extproc functionality is heavily used for that. Since file formats evolve over time so do these external filters, so if you're on 11 you better use filters native to this version (actually I can't see how you could use 9i filters with 11 anyway.)

Regards,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Sun May 09 2010 - 12:42:30 CDT

Original text of this message