Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question

From: Michael Austin <maustin_at_firstdbasource.com>
Date: Thu, 09 Apr 2009 09:48:20 -0500
Message-ID: <SOnDl.16246$D32.3911_at_flpi146.ffdc.sbc.com>



lsllcm wrote:
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>> lsllcm wrote:
>>> After tested, csscan also report application data exception when
>>> prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and
>>> we cannot use csalter.plb to change the database character set.
>>> Database character set
>>> WE8ISO8859P1
>>> FROMCHAR
>>> WE8ISO8859P1
>>> TOCHAR
>>> WE8MSWIN1252
>>> Scan NCHAR data?
>>> NO
>>> Array fetch buffer size
>>> 1024000
>>> Number of processes
>>> 1
>>> Capture convertible data?
>>> NO
>>> ---------------------------------------------------------------------------­---
>>> [Data Dictionary individual exceptions]
>>> [Application data individual exceptions]
>>> User  : JACKY
>>> Table : AAA
>>> Column: C1
>>> Type  : VARCHAR2(1000)
>>> Number of Exceptions         : 1
>>> Max Post Conversion Data Size: 9
>>> ROWID              Exception Type      Size Cell Data(first 30
>>> bytes)
>>> ------------------ ------------------ -----------------------------------
>>> AAALaRAAEAAAAAQAAA lossy conversion         sys.…Med
>>> ------------------ ------------------ -----------------------------------
>> There's mot to changing the database character set, Metalink Note 555823.1
>> covers it in detail.
>>
>> First, you have to be certain that all the non-ASCII characters in the database
>> are WE8MSWIN1252.
>>
>> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252.
>> There should be no errors.
>>
>> Then you can convert the database.
>>
>> Yours,
>> Laurenz Albe- Hide quoted text -
>>
>> - Show quoted text -

>
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>> lsllcm wrote:
>>> After tested, csscan also report application data exception when
>>> prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and
>>> we cannot use csalter.plb to change the database character set.
>>> Database character set
>>> WE8ISO8859P1
>>> FROMCHAR
>>> WE8ISO8859P1
>>> TOCHAR
>>> WE8MSWIN1252
>>> Scan NCHAR data?
>>> NO
>>> Array fetch buffer size
>>> 1024000
>>> Number of processes
>>> 1
>>> Capture convertible data?
>>> NO
>>> ---------------------------------------------------------------------------­---
>>> [Data Dictionary individual exceptions]
>>> [Application data individual exceptions]
>>> User  : JACKY
>>> Table : AAA
>>> Column: C1
>>> Type  : VARCHAR2(1000)
>>> Number of Exceptions         : 1
>>> Max Post Conversion Data Size: 9
>>> ROWID              Exception Type      Size Cell Data(first 30
>>> bytes)
>>> ------------------ ------------------ -----------------------------------
>>> AAALaRAAEAAAAAQAAA lossy conversion         sys.…Med
>>> ------------------ ------------------ -----------------------------------
>> There's mot to changing the database character set, Metalink Note 555823.1
>> covers it in detail.
>>
>> First, you have to be certain that all the non-ASCII characters in the database
>> are WE8MSWIN1252.
>>
>> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252.
>> There should be no errors.
>>
>> Then you can convert the database.
>>
>> Yours,
>> Laurenz Albe- Hide quoted text -
>>
>> - Show quoted text -

>
> yes, Java used UCS-2 initially, and added UTF-16 supplementary
> character support in J2SE 5.0. I refer to the document.
> http://en.wikipedia.org/wiki/UTF-16
>
> Below command does work.
>
> csscan \"sys as sysdba\" full=y fromchar=WE8MSWIN1252
> tochar=WE8MSWIN1252
>
> It can alter database characterset.
>
> But a little more complex.
>
> I use java to read the both WE8ISO8859P1 and WE8MSWIN1252 dbs
>
> 1 in WE8ISO8859P1 db, I print rs.getBytes("c1") array, the result is
> as below, so it should be already unicode, and it does not do any
> conversion.
> 125
> 115
> 121
> 115
> 46
> -123 ===========>> as same as 256-123 = 133
> 77
> 101
> 100
> 0
>
> 2 in WE8MSWIN1252 db, I print rs.getBytes("c1") array, the result is
> as below, looks it is converted after it is read.
> 125
> 115
> 121
> 115
> 46
> -65 ===========>> as same as 256-65 = 191
> 77
> 101
> 100
> 0
>
> If we use item 1 to convert, they are all wrong, but the UI are same
> even they are wrong.
> If we use item 2 to convert, the before is wrong, but after convert,
> it is correct. but the UI will be different.
>
> To be consistent, I choose item 1. At least, the data is not lost from
> UI in both before and after..
>
> If client cannot afford it, he/she should correct it at very early
> time.
>
> This is not one technical question, it is choose question.
>
> Thank your help, it is helpful

I liked it when there was only one character set ASCII to deal with... :) Received on Thu Apr 09 2009 - 09:48:20 CDT

Original text of this message