Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question

From: lsllcm <lsllcm_at_gmail.com>
Date: Thu, 9 Apr 2009 04:55:13 -0700 (PDT)
Message-ID: <54fc7701-07e9-4956-8ed4-e786465a6fc7_at_r36g2000vbr.googlegroups.com>

On Apr 9, 5:44�pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
> lsllcm wrote:
> > After tested, csscan also report application data exception when
> > prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and
> > we cannot use csalter.plb to change the database character set.
>
> > Database character set
> > WE8ISO8859P1
> > FROMCHAR
> > WE8ISO8859P1
> > TOCHAR
> > WE8MSWIN1252
> > Scan NCHAR data?
> > NO
> > Array fetch buffer size
> > 1024000
> > Number of processes
> > 1
> > Capture convertible data?
> > NO
> > ---------------------------------------------------------------------------�---
>
> > [Data Dictionary individual exceptions]
>
> > [Application data individual exceptions]
>
> > User �: JACKY
> > Table : AAA
> > Column: C1
> > Type �: VARCHAR2(1000)
> > Number of Exceptions � � � � : 1
> > Max Post Conversion Data Size: 9
>
> > ROWID � � � � � � �Exception Type � � �Size Cell Data(first 30
> > bytes)
> > ------------------ ------------------ -----------------------------------
> > AAALaRAAEAAAAAQAAA lossy conversion � � � � sys.�Med
> > ------------------ ------------------ -----------------------------------
>
> There's mot to changing the database character set, Metalink Note 555823.1
> covers it in detail.
>
> First, you have to be certain that all the non-ASCII characters in the database
> are WE8MSWIN1252.
>
> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252.
> There should be no errors.
>
> Then you can convert the database.
>
> Yours,
> Laurenz Albe- Hide quoted text -
>
> - Show quoted text -

On Apr 9, 5:44 pm, "Laurenz Albe" <inv..._at_spam.to.invalid> wrote:
> lsllcm wrote:
> > After tested, csscan also report application data exception when
> > prepare to change character set from WE8ISO8859P1 to WE8MSWIN1252, and
> > we cannot use csalter.plb to change the database character set.
>
> > Database character set
> > WE8ISO8859P1
> > FROMCHAR
> > WE8ISO8859P1
> > TOCHAR
> > WE8MSWIN1252
> > Scan NCHAR data?
> > NO
> > Array fetch buffer size
> > 1024000
> > Number of processes
> > 1
> > Capture convertible data?
> > NO
> > ---------------------------------------------------------------------------�---
>
> > [Data Dictionary individual exceptions]
>
> > [Application data individual exceptions]
>
> > User : JACKY
> > Table : AAA
> > Column: C1
> > Type : VARCHAR2(1000)
> > Number of Exceptions : 1
> > Max Post Conversion Data Size: 9
>
> > ROWID Exception Type Size Cell Data(first 30
> > bytes)
> > ------------------ ------------------ -----------------------------------
> > AAALaRAAEAAAAAQAAA lossy conversion sys.�Med
> > ------------------ ------------------ -----------------------------------
>
> There's mot to changing the database character set, Metalink Note 555823.1
> covers it in detail.
>
> First, you have to be certain that all the non-ASCII characters in the database
> are WE8MSWIN1252.
>
> Then you would have to run csscan with FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252.
> There should be no errors.
>
> Then you can convert the database.
>
> Yours,
> Laurenz Albe- Hide quoted text -
>
> - Show quoted text -

yes, Java used UCS-2 initially, and added UTF-16 supplementary character support in J2SE 5.0. I refer to the document. http://en.wikipedia.org/wiki/UTF-16

Below command does work.

csscan \"sys as sysdba\" full=y fromchar=WE8MSWIN1252 tochar=WE8MSWIN1252

It can alter database characterset.

But a little more complex.

I use java to read the both WE8ISO8859P1 and WE8MSWIN1252 dbs

1 in WE8ISO8859P1 db, I print rs.getBytes("c1") array, the result is as below, so it should be already unicode, and it does not do any conversion.
125
115
121
115
46
-123 ===========>> as same as 256-123 = 133 77
101
100
0

2 in WE8MSWIN1252 db, I print rs.getBytes("c1") array, the result is as below, looks it is converted after it is read. 125
115
121
115
46
-65 ===========>> as same as 256-65 = 191 77
101
100
0

If we use item 1 to convert, they are all wrong, but the UI are same even they are wrong.
If we use item 2 to convert, the before is wrong, but after convert, it is correct. but the UI will be different.

To be consistent, I choose item 1. At least, the data is not lost from UI in both before and after..

If client cannot afford it, he/she should correct it at very early time.

This is not one technical question, it is choose question.

Thank your help, it is helpful Received on Thu Apr 09 2009 - 06:55:13 CDT

This message: [ Message body ]
Next message: stevedhoward_at_gmail.com: "Re: Oralce 10gR2 Setup questions"
Previous message: Mladen Gogala: "Re: unknown files in /tmp"
Maybe in reply to: lsllcm: "WE8ISO8859P1 convert to AL32UTF8 unicode character set question"
Next in thread: Michael Austin: "Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question"
Reply: Michael Austin: "Re: WE8ISO8859P1 convert to AL32UTF8 unicode character set question"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message