Re: Complex CONTEXT index

From: Rich Jesse <rjoralist_at_society.servebeer.com>
Date: Fri, 23 Jan 2009 11:16:37 -0600 (CST)
Message-ID: <b67c29d1de93e9c22bc0fa123f2240c6.squirrel_at_society.servebeer.com>



> Don't you need to translate the BLOB content into indexable text before you
> index it? A simple transliteration of hex values is no help; you need
> something that would convert the enclosed encoded Word or PDF into real
> words.

This is exactly what Oracle's Ultrasearch does. I've used it in the past (10gR1) to power an Intranet search site that crawled Intranet sites as well as file shares. It indexes very well, grabbing text from every popular format including binary MS documents, PDFs, drawings, as well as the headers in images and video files. But it wasn't the most stable. I needed to bounce it regularly, at least monthly, IIRC.

Perhaps it's better with 10gR2 or 11g, if it's still available.

HTH! GL! Rich

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 23 2009 - 11:16:37 CST

Original text of this message