Re: Indexing pdf/doc file contents AND other text data
Date: Fri, 17 Oct 2008 04:06:06 -0700 (PDT)
Message-ID: <56cd566e-fbe0-4e09-8949-cf2b8bd87cbe@m32g2000hsf.googlegroups.com>
On Oct 17, 4:19 am, "Vladimir M. Zakharychev"
<vladimir.zakharyc..._at_gmail.com> wrote:
>
> If all this information is stored in the same table,
> MULTI_COLUMN_DATASTORE may do for you. Text will concatenate all
> specified columns into a synthetic document and index it
> automatically. It does not support joins though (it can call
> functions, so you may be able to employ implicit joins, but it's not
> really very efficient;) so if you have several tables you want to
> merge into the index, you'll need to resort to USER_DATASTORE and
> write your own document synthesizer procedure. Look up these datastore
> types in the docs for the details, and I recall posting an example of
> USER_DATASTORE solution a few years ago into this very group.
Thanks, I am thinking now of avoiding storing the document in the database altogether.
Make one USER_DATASTORE where I retrieve the document from disk, extract the text, then add the other necessary information.
> Also, how do you maintain the index? Since the content is external,
> Oracle can't actually control the changes made to it externally - how
> do you deal with this? I mean, when someone changes the file, Oracle
> has no way figuring out it was changed because it only stores a
> pointer to this file and this pointer didn't change. Any non-
> transactional change breaks the index, because it still holds [wrong]
> data for previous version of the changed document.
This is similar to the way Oracle doesn't know when USER_DATASTOREs change. Whenever a user updates a document, I would just put the current timestamp in the index column. Received on Fri Oct 17 2008 - 06:06:06 CDT