Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: Tue, 7 Apr 2009 10:34:49 +0200
Message-ID: <1239093312.967203_at_proxy.dienste.wien.at>



lsllcm wrote:
> I have one question about CSSCAN when I do conversion from
> WE8ISO8859P1 to AL32UTF8 unicode character set.
>
> create table aaa (c1 varchar2(10));
> insert into aaa values ('sys..Med');
>
> After csscan the data, get the application exception data as below
>
> ------------------------------------------------
> [Application data individual exceptions]
> User : JACKY
> Table : AAA
> Column: C1
> Type : VARCHAR2(10)
> Number of Exceptions : 1
> Max Post Conversion Data Size: 9
> ROWID Exception Type Size Cell Data(first 30 bytes)
> ------------------ ------------------ -----------------------------------
>AAAYtgAAEAAAEKPAAA lossy conversion sys..Med
> ------------------ ------------------ -----------------------------------
>

[...]
>
> Below is dump of data
>
> 1. dump in AL32UTF8 character set
> select dump(convert(c1,'AL32UTF8')) from aaa
> DUMP(CONVERT(C1,'AL32UTF8'))
> ----------------------------------------------------
> Typ=1 Len=9:
> 115,121,115,46,194,133,77,101,100
> 2. dump in WE8ISO8859P1 character set
> select dump(c1) from aaa
> DUMP(C1)
> ----------------------------------------------------
> Typ=1 Len=8:
> 115,121,115,46,133,77,101,100

> Is report of cssacn not correct?

The report of csscan is correct.

The problem is the fifth character. In the insert statement above it looks like an ellipsis (three dots), and it ends up as 133 or hex 85 in your database, so I assume that you mean UNICODE hex 2026, the horizontal ellipsis, which is hex 85 in the Windows character set WE8MSWIN1252.

The problem is that your database character set is not WE8MSWIN1252, where everything would be fine, but WE8ISO8859P1, where hex 85 does not exist.

So csscan warns you that if you convert this to AL32UTF8, hex 85, which is an illegal value that has no meaning in WE8ISO8859P1, will be converted to the default replacement character ¿

Your mistake is probably a wrong client character set. Your NLS_LANG contains WE8ISO8859P1, and your client operating system is Windows, correct?
That would mean that you are suffering from an Oracle "bug", namely that character input is not checked for validity when client character set and server character set are the same.
That way you end up with currupt data in your database.

Yours,
Laurenz Albe Received on Tue Apr 07 2009 - 03:34:49 CDT

Original text of this message