Re: Oracle Text / Office 2007 question / 10gR2

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Sun, 16 Mar 2008 08:00:13 -0700 (PDT)
Message-ID: <29720085-8102-4f40-8ceb-17a9716acd2c@i7g2000prf.googlegroups.com>


On Mar 15, 12:50 pm, jeremy <jeremy0..._at_gmail.com> wrote:
> Hi
>
> As I understand it Oracle Text in 10gR2 is not able to text index .docx
> files generated by MS Office 2007. As the use of this format is only
> going to increase (and we have to allow for this type of file) have any
> of you come across this problem and did you devise a workaround for it?
>
> Our application accepts CVs from candidates each of which has to be
> indexed.
>
> We are running on RHEL4 / 10gR2.
>
> cheers
>
> --
> jeremy

Oracle Text currently uses Verity KeyView document filters which they licensed ca. 9.2.0.4 instead of (now discontinued) Inso Corp.'s. Verity probably already added support for 2007 formats but integrating this support into all supported Oracle releases and patchsets can definitely take a while even if they are used unchanged. For some reason, Oracle never created one-offs for Text filtering components and always delivered new versions in patchsets.

If you can afford a Windows-based Oracle instance with COM Automation option, you can set up Office 2007 there and use COM Automation to invoke Office apps from the Oracle instance. You can then create a db link to that Windows instance, create a function that will take a BLOB as input and return a BLOB with converted document as output, and call this function via the db link from other instances. The function would write the source document into a file, use COM Automation to instantiate an Office application and load the document, save it in desired format, and then read it back into the resulting BLOB. Shouldn't be too complex to implement.

Regards,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Sun Mar 16 2008 - 10:00:13 CDT

Original text of this message