Re: NCLOB fields and CLOB.createTemporary does not work

From: John Hendrikx <hjohn_at_please.remove.this.xs4all.nl>
Date: Thu, 28 Apr 2005 15:30:59 +0200
Message-ID: <4270e597$0$157$e4fe514c@news.xs4all.nl>

>>>>I'm using Oracle 9i, using the latest thin JDBC Java driver.  I'm trying
>>>>to insert a NCLOB into the database.  The recommended way to do this is
>>>>to use CLOB.createTemporary() as you can see in the code below.  This
>>>>code works fine for CLOB fields, but fails with a 'ORA-12704: character
>>>>set mismatch' when the field is NCLOB.
>>>
>>>
>>>
>>>I am sure you will not catch an ugly disease from posting
>>>the character sets in question?
>>
>>
>>Java uses it's standard UCS2 character for all strings, and our data is
>>in that format.
>>
>>The databases we've tested on are:
>>
>>Oracle 9i, with a database using WE8MSWIN1252 charset.  We found this
>>setup is useless, as all international characters will be discarded
>>during the translation process, dispite using NVARCHAR2/NCLOB fields.
>>

>
>
> Still, that is not a client setting.

True, I don't think we can influence the client setting in the Java JDBC driver, it simply uses the database character set, which is why WE8MSWIN1252 is a problem because all communication between Java and the Database is converted to that charset, resulting in the loss of international characters.

> The MSWIN1252 means nothing; NCLOB/NVARCHAR fields would be either
> UTF8, or AL16UTF16 (default).

It's the latter setting. The national fields use AL16UTF16 (I checked the screenshots we've made of the entire install process for the 2nd Oracle 9i example).

So to clarify:

Oracle 9i installation #1: db = WE8MSWIN1252; national = AL16UTF16 Oracle 9i installation #2: db = AL32UTF8; national = AL16UTF16 Oracle 10g installation: db = AL32UTF8; national = AL16UTF16

On Oracle 9i #1, we cannot get international characters stored at all via the Java JDBC client (using NVARCHAR2/NCLOB fields) because everything seems to be converted to the database charset prior to transmission.

On Oracle 9i #2, international characters are retained, but there are major problems when the NCLOB fields are very large (over 100K characters is not uncommon). This is more than likely not an Oracle 9i problem, and I never claimed it was, but it is the Oracle 9i thin JDBC driver that is probably the cause of our problems.

On Oracle 10g, everything works fine.

> Besides, I do not understand why using NVARCHAR/NCLOB fields, when
> the standard counterparts would do, too. The reference to WE8MSWIN1252
> makes me believe you do not need to store multibyte characters, like
> japanese, Chinese or Korean.

We need to store data in various West and East European languages in a single datbase (like cyrillic and greek), where no single 8-bit charset would suffice. The WE8MSWIN1252 charset is just the default we used on one of the installs, thinking that it would not affect what we could store in NVARCHAR2/NCLOB fields. Unfortunately, it does, although admittedly that seems to be a problem of the Java Oracle driver.

>>Oracle 9i, with a database using the AL32UTF8 charset.  This works,
>>international characters are retained in both NVARCHAR2 and NCLOB
>>fields.  NCLOB fields however only work with small sizes.  Above 32K
>>we're forced to use the CLOB.createTemporary() construction (instead of
>>setString() from PreparedStatement) but that results in the problem
>>detailed in my initial post.

>
> Strings in Oracle go up to 32k, and that only within PL/SQL. String
> storage is 4000 bytes (!) max, or 4G, if you use LOBs.

Yes, I was referring to the fact that for LOB type fields you can treat them as strings as long as they are small enough (ie, you can use the string functions to get/set them). Anything larger however requires you work with temporary clobs (which use a stream internally). The problem is that using the latter results in the character set mismatch error.

>>Yes, but I donot see how it is relevant what Java supports, when the
>>Oracle Database is the one that is unable to convert the standard java
>>character set to one of its own.

>
> I could just as easily state you do not seem to grasp the concept of
> character sets within oracle, and what character set is active in
> what case.
> As I am not very much into java, I will not.

I'll admit that Oracle is somewhat surprising in that respect; when initially adding unicode support I assumed that using NVARCHAR2/NCLOB fields would be sufficient, however it turned out we needed to change the database charset to prevent international characters to be lost during transmission -- but like I said, this seems to be a driver problem. The other databases we support had little problem supporting the characters we needed for our Java application.

> Here's what I think happens:
> - you install, and create a database, using 9i with the WE8MSWIN1252
> character set.

Yes, one of the databases we installed was using that setting, the other two weren't.

> Without knowing the details, I am guessing from here on.
> - the environment setting for the client processes (e.g. sqlplus) are
> set to WE8MSWIN1252. This is done by the installer.

We donot use any clients, other than the Java JDBC thin driver. I haven't been able to find any setting that I could modify in this driver that is related to character sets.

> - consequence is: any process connecting will have to be converted TO
> WE8MSWIN1252. A character set, *NOT* supported by Java.

Yes, but keep in mind that this conversion process works correctly as long as the data in the NCLOB fields is small enough; it only fails when switching to the streaming method for large amounts of text.

Also, I believe the iso-8859-1 charset (or iso-8859-15 when Euro symbol is needed) is equivalent to WE8MSWIN1252, and both of those are supported by Java's conversion functions (Java still stores all data in UCS2, but it can convert between a few character sets and present you with the raw data).

> - If the above scenario is correct, that would also explain why the
> experiments with AL32UTF8 in either 9i, or 10g, work out OK.

I donot understand that conclusion. The experiment only works on 10g when using NCLOB fields, and works on all databases when using CLOB fields.

> Still, you have to understand, NVARCHAR, NCLOB and NCHAR fields
> obey the character set as defined in the 'create database ... national
> character set xxxxxx' command. All the other character fields will obey
> the 'character set yyyyy' part of the 'create database ... character
> set yyyyy national character set xxxxxx' command.

Yes, I understood that is how it should have worked, but not from a Java perspective it seems.

> And I would use the clob handling routines when working clobs, and
> the string handling routines for handling varchars.

This is an internal driver descision. It uses the string handling routines for data below a certain size (probably because of performance reasons) and the CLOB specific streaming functions when above a certain size.

> Maybe an experienced java/oracle programmer can comment on that?

:-)

Thanks for your insights so far

--
John Hendrikx

Received on Thu Apr 28 2005 - 08:30:59 CDT