Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question

From: lsllcm <lsllcm_at_gmail.com>
Date: Tue, 7 Apr 2009 19:19:26 -0700 (PDT)
Message-ID: <ea899390-2e2c-4769-a18b-a50822b0105e_at_v12g2000vbb.googlegroups.com>



On Apr 7, 4:34 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
> lsllcm wrote:
> > I have one question about CSSCAN when I do conversion from
> > WE8ISO8859P1 to AL32UTF8 unicode character set.
>
> > create table aaa (c1 varchar2(10));
> > insert into aaa values ('sys..Med');
>
> > After csscan the data, get the application exception data as below
>
> > ------------------------------------------------
> > [Application data individual exceptions]
> > User  : JACKY
> > Table : AAA
> > Column: C1
> > Type  : VARCHAR2(10)
> > Number of Exceptions         : 1
> > Max Post Conversion Data Size: 9
> > ROWID              Exception Type      Size Cell Data(first 30 bytes)
> > ------------------ ------------------ -----------------------------------
> >AAAYtgAAEAAAEKPAAA lossy conversion         sys..Med
> > ------------------ ------------------ -----------------------------------
>
> [...]
>
> > Below is dump of data
>
> > 1. dump in AL32UTF8 character set
> > select dump(convert(c1,'AL32UTF8')) from aaa
> > DUMP(CONVERT(C1,'AL32UTF8'))
> > ----------------------------------------------------
> > Typ=1 Len=9:
> > 115,121,115,46,194,133,77,101,100
> > 2. dump in WE8ISO8859P1 character set
> > select dump(c1) from aaa
> > DUMP(C1)
> > ----------------------------------------------------
> > Typ=1 Len=8:
> > 115,121,115,46,133,77,101,100
> > Is report of cssacn not correct?
>
> The report of csscan is correct.
>
> The problem is the fifth character. In the insert statement above it looks
> like an ellipsis (three dots), and it ends up as 133 or hex 85 in your
> database, so I assume that you mean UNICODE hex 2026, the horizontal
> ellipsis, which is hex 85 in the Windows character set WE8MSWIN1252.
>
> The problem is that your database character set is not WE8MSWIN1252,
> where everything would be fine, but WE8ISO8859P1, where hex 85
> does not exist.
>
> So csscan warns you that if you convert this to AL32UTF8, hex 85,
> which is an illegal value that has no meaning in WE8ISO8859P1,
> will be converted to the default replacement character ¿
>
> Your mistake is probably a wrong client character set.
> Your NLS_LANG contains WE8ISO8859P1, and your client operating
> system is Windows, correct?
> That would mean that you are suffering from an Oracle "bug", namely
> that character input is not checked for validity when client character set
> and server character set are the same.
> That way you end up with currupt data in your database.
>
> Yours,
> Laurenz Albe- Hide quoted text -
>
> - Show quoted text -

Thanks for your comments:

Below is my test case

  1. in WE8ISO8859P1 characterset db on linux platform, 1.1 create table aaa (c1 varchar2(10)); insert into aaa values ('sys.…Med');

1.2 select dump(c1) from aaa
DUMP(C1)



Typ=1 Len=8:
115,121,115,46,133,77,101,100

1.3 exp userid=jacky/jacky file=aaa.dmp tables=aaa

2. in AL32UTF characterset db on another linux platform
2.1 set NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
2.2 imp userid=system file=aaa.dmp fromuser=jacky touser=jacky
ignore=y
2.3 select dump(convert(c1,'AL32UTF8')) from aaa DUMP(CONVERT(C1,'AL32UTF8'))

Typ=1 Len=9:
115,121,115,46,194,133,77,101,100

After converted, the char with dump number "133" is converted to char with dump number "194,133". It is not same as the char default replacement character ¿.

I am not sure if "133" in WE8ISO8859P1 can map to "194,133" in AL32UTF8.
OR
if "133" in WE8MSWIN1252 can map to "194,133" in AL32UTF8.

But from result in UI, they are same.

Could you help review it again? Received on Tue Apr 07 2009 - 21:19:26 CDT

Original text of this message