Re: Oracle Text: Indexing UTF8 or UTF16

From: Frank van Bortel <frank.van.bortel_at_gmail.com>
Date: Thu, 19 May 2005 10:00:46 +0200
Message-ID: <d6hgr1$sf3$2_at_news6.zwoll1.ov.home.nl>


Server Applications wrote:
> Hello
>
> I am trying to build a system where I can full-text index documents with
> UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a
> third-party component outside the database, so the I dont need filtering in
> Oracle, but only indexing.
> If I put file references to the filtered files in the database and index
> these (using FILE_DATASTORE), everything works fine. But I rather put the
> filtered data in the database, and index it from here (using the
> PROCECURE_FILTER). But this gives me some problems when the data is actually
> unicode data.
> The interface for the procedure in the PROCEDURE_FILTER does not allow the
> data to be output as NCLOB or NVARCHAR, but only CLOB or VARCHAR. Indexing
> the data directly in the table (using eg. an NULL_FILTER or CHARSET_FILTER)
> have the same impact. If I try to index a column of the type NCLOB or
> NVARCHAR, the index-creation gives me an error telling me that it is an
> invalid column-type.
>
> I have tried to create a database with the UTF8 character set, expecting
> that the CLOB column type then could contain the UTF8 data, and that the
> indexing then would recognize the unicode characters in the data. This does
> not give any errors, but none of the unicode string in the data are
> contained in the index, only the strings in english (or ascii, strings with
> characters all within 1 byte) are contained in the index afterwards.
>
> Is is not possible to index data directly in a column (using either
> CHARSET_FILTER, NULL_FILTER or PROCEDURE_FILTER) that is in UTF8 or UTF16
> format?
>
>
> Thanks in advance for any comments.
>
> /David
>
>
This ng is dead - repost in cdo.server

-- 
Regards,
Frank van Bortel
Received on Thu May 19 2005 - 10:00:46 CEST

Original text of this message