Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question
From: Michael Austin <maustin_at_firstdbasource.com>
Date: Thu, 09 Apr 2009 09:48:20 -0500
Message-ID: <SOnDl.16246$D32.3911_at_flpi146.ffdc.sbc.com>
lsllcm wrote:
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>
> yes, Java used UCS-2 initially, and added UTF-16 supplementary
> character support in J2SE 5.0. I refer to the document.
> http://en.wikipedia.org/wiki/UTF-16
>
> Below command does work.
>
> csscan \"sys as sysdba\" full=y fromchar=WE8MSWIN1252
> tochar=WE8MSWIN1252
>
> It can alter database characterset.
>
> But a little more complex.
>
> I use java to read the both WE8ISO8859P1 and WE8MSWIN1252 dbs
>
> 1 in WE8ISO8859P1 db, I print rs.getBytes("c1") array, the result is
> as below, so it should be already unicode, and it does not do any
> conversion.
> 125
> 115
> 121
> 115
> 46
> -123 ===========>> as same as 256-123 = 133
> 77
> 101
> 100
> 0
>
> 2 in WE8MSWIN1252 db, I print rs.getBytes("c1") array, the result is
> as below, looks it is converted after it is read.
> 125
> 115
> 121
> 115
> 46
> -65 ===========>> as same as 256-65 = 191
> 77
> 101
> 100
> 0
>
> If we use item 1 to convert, they are all wrong, but the UI are same
> even they are wrong.
> If we use item 2 to convert, the before is wrong, but after convert,
> it is correct. but the UI will be different.
>
> To be consistent, I choose item 1. At least, the data is not lost from
> UI in both before and after..
>
> If client cannot afford it, he/she should correct it at very early
> time.
>
> This is not one technical question, it is choose question.
>
> Thank your help, it is helpful
Date: Thu, 09 Apr 2009 09:48:20 -0500
Message-ID: <SOnDl.16246$D32.3911_at_flpi146.ffdc.sbc.com>
lsllcm wrote:
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>> lsllcm wrote: >>> After tested, csscan also report application data exception when >>> prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and >>> we cannot use csalter.plb to change the database character set. >>> Database character set >>> WE8ISO8859P1 >>> FROMCHAR >>> WE8ISO8859P1 >>> TOCHAR >>> WE8MSWIN1252 >>> Scan NCHAR data? >>> NO >>> Array fetch buffer size >>> 1024000 >>> Number of processes >>> 1 >>> Capture convertible data? >>> NO >>> ------------------------------------------------------------------------------ >>> [Data Dictionary individual exceptions] >>> [Application data individual exceptions] >>> User : JACKY >>> Table : AAA >>> Column: C1 >>> Type : VARCHAR2(1000) >>> Number of Exceptions : 1 >>> Max Post Conversion Data Size: 9 >>> ROWID Exception Type Size Cell Data(first 30 >>> bytes) >>> ------------------ ------------------ ----------------------------------- >>> AAALaRAAEAAAAAQAAA lossy conversion sys.…Med >>> ------------------ ------------------ ----------------------------------- >> There's mot to changing the database character set, Metalink Note 555823.1 >> covers it in detail. >> >> First, you have to be certain that all the non-ASCII characters in the database >> are WE8MSWIN1252. >> >> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252. >> There should be no errors. >> >> Then you can convert the database. >> >> Yours, >> Laurenz Albe- Hide quoted text - >> >> - Show quoted text -
>
> On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
>> lsllcm wrote: >>> After tested, csscan also report application data exception when >>> prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and >>> we cannot use csalter.plb to change the database character set. >>> Database character set >>> WE8ISO8859P1 >>> FROMCHAR >>> WE8ISO8859P1 >>> TOCHAR >>> WE8MSWIN1252 >>> Scan NCHAR data? >>> NO >>> Array fetch buffer size >>> 1024000 >>> Number of processes >>> 1 >>> Capture convertible data? >>> NO >>> ------------------------------------------------------------------------------ >>> [Data Dictionary individual exceptions] >>> [Application data individual exceptions] >>> User : JACKY >>> Table : AAA >>> Column: C1 >>> Type : VARCHAR2(1000) >>> Number of Exceptions : 1 >>> Max Post Conversion Data Size: 9 >>> ROWID Exception Type Size Cell Data(first 30 >>> bytes) >>> ------------------ ------------------ ----------------------------------- >>> AAALaRAAEAAAAAQAAA lossy conversion sys.…Med >>> ------------------ ------------------ ----------------------------------- >> There's mot to changing the database character set, Metalink Note 555823.1 >> covers it in detail. >> >> First, you have to be certain that all the non-ASCII characters in the database >> are WE8MSWIN1252. >> >> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252. >> There should be no errors. >> >> Then you can convert the database. >> >> Yours, >> Laurenz Albe- Hide quoted text - >> >> - Show quoted text -
>
> yes, Java used UCS-2 initially, and added UTF-16 supplementary
> character support in J2SE 5.0. I refer to the document.
> http://en.wikipedia.org/wiki/UTF-16
>
> Below command does work.
>
> csscan \"sys as sysdba\" full=y fromchar=WE8MSWIN1252
> tochar=WE8MSWIN1252
>
> It can alter database characterset.
>
> But a little more complex.
>
> I use java to read the both WE8ISO8859P1 and WE8MSWIN1252 dbs
>
> 1 in WE8ISO8859P1 db, I print rs.getBytes("c1") array, the result is
> as below, so it should be already unicode, and it does not do any
> conversion.
> 125
> 115
> 121
> 115
> 46
> -123 ===========>> as same as 256-123 = 133
> 77
> 101
> 100
> 0
>
> 2 in WE8MSWIN1252 db, I print rs.getBytes("c1") array, the result is
> as below, looks it is converted after it is read.
> 125
> 115
> 121
> 115
> 46
> -65 ===========>> as same as 256-65 = 191
> 77
> 101
> 100
> 0
>
> If we use item 1 to convert, they are all wrong, but the UI are same
> even they are wrong.
> If we use item 2 to convert, the before is wrong, but after convert,
> it is correct. but the UI will be different.
>
> To be consistent, I choose item 1. At least, the data is not lost from
> UI in both before and after..
>
> If client cannot afford it, he/she should correct it at very early
> time.
>
> This is not one technical question, it is choose question.
>
> Thank your help, it is helpful
I liked it when there was only one character set ASCII to deal with... :) Received on Thu Apr 09 2009 - 09:48:20 CDT