Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Oracle Text: Indexing UTF8 or UTF16

Re: Oracle Text: Indexing UTF8 or UTF16

From: Frank van Bortel <frank.van.bortel_at_gmail.com>
Date: Thu, 19 May 2005 12:49:42 +0200
Message-ID: <d6hqnp$thv$1@news1.zwoll1.ov.home.nl>


Server Applications wrote:
> Hello
>
> I am trying to build a system where I can full-text index documents with
> UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a
> third-party component outside the database, so the I dont need filtering in
> Oracle, but only indexing.
> If I put file references to the filtered files in the database and index
> these (using FILE_DATASTORE), everything works fine. But I rather put the
> filtered data in the database, and index it from here (using the
> PROCECURE_FILTER). But this gives me some problems when the data is actually
> unicode data.
> The interface for the procedure in the PROCEDURE_FILTER does not allow the
> data to be output as NCLOB or NVARCHAR, but only CLOB or VARCHAR. Indexing
> the data directly in the table (using eg. an NULL_FILTER or CHARSET_FILTER)
> have the same impact. If I try to index a column of the type NCLOB or
> NVARCHAR, the index-creation gives me an error telling me that it is an
> invalid column-type.
>
> I have tried to create a database with the UTF8 character set, expecting
> that the CLOB column type then could contain the UTF8 data, and that the
> indexing then would recognize the unicode characters in the data. This does
> not give any errors, but none of the unicode string in the data are
> contained in the index, only the strings in english (or ascii, strings with
> characters all within 1 byte) are contained in the index afterwards.
>
> Is is not possible to index data directly in a column (using either
> CHARSET_FILTER, NULL_FILTER or PROCEDURE_FILTER) that is in UTF8 or UTF16
> format?
>
>
> Thanks in advance for any comments.
>
> /David
>
>

What language did you install for Oracle Text? The default is (US) English. You probably want to install multiple languages.

If I understand your post correctly, the data loaded is *not* UTF; so actually, this is not about Text, but about character sets (UTF or a fixed-8-byte character set)

Please post versions.

-- 
Regards,
Frank van Bortel
Received on Thu May 19 2005 - 05:49:42 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US