RE: Replacement of US7ASCII character set in 11g?

From: Bobak, Mark <Mark.Bobak_at_proquest.com>
Date: Mon, 11 Jan 2010 14:49:53 -0500
Message-ID: <6AFC12B9BFCDEA45B7274C534738067F36AF4FDF_at_AAPQMAILBX02V.proque.st>

Hi Janine,

Problem is that "LOSSY" may mean that the data is already corrupted in the source database, so, Oracle has no idea what to convert it to. In the case of US7ASCII, character values range from 0-127, correct? So, if you have any values 128 or greater, Oracle has no idea what character that value should map to. It would depend on what *assumption* the client code was operating under when the data was inserted. So, it doesn't matter if AL32UTF8 is the "mother of all character sets". Even if AL32UTF8 *does* have the character you need, it doesn't matter, cause Oracle doesn't know what that character is in the source. If it can't determine that, it can't map it to the correct character in the destination character set.

See Doc ID 444701.1 "CSSCAN Output Explained", for more info, particularly "B.4) LOSSY data".

Hope that helps,

-Mark

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Janine Sisk Sent: Monday, January 11, 2010 2:37 PM
To: oracle-l L
Subject: Re: Replacement of US7ASCII character set in 11g?

Thanks to everyone who replied....

I ran CSSCAN on the original 8.1.7 database and, of course, ran into trouble right away. The conversion from US7ASCII to WE8MSWIN1252 is lossy in a number of places. This does not surprise me terribly; Jared mentioned that you can put "invalid" data into a database of type US7ASCII and I'm pretty sure that all of the programmers who have worked on this site over the years have just assumed that if the database didn't choke on it, then it was ok.

What concerns me is that CSSCAN reports that converting to UTF8 will have the exact same lossy conversions. The two error files are literally identical except for the value of TOCHAR. I thought that UTF8 was the mother of all character sets, so where do I go from here?

As an example, one of the first errors reported looks like this when I do a SELECT in sqlplus:

Lic. en medios de comunicaciÃ?3n

I will be digging further into this, with The Google and all, but if anyone has any light to shed, please do!

janine

On Jan 7, 2010, at 3:07 PM, David Mann wrote:

When I had clients worried about character set conversions I usually ran Character Set Scanner utility CSSCAN on a copy of the database to check for differences. Here is the reference in the 10g documentation, assuming it is still available in 11g but don't have a link handy.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm

--

Dave Mann
www.brainio.us<http://www.brainio.us>
www.ba6.us<http://www.ba6.us> - Database Stuff - http://www.ba6.us/rss.xml

--

http://www.freelists.org/webpage/oracle-l Received on Mon Jan 11 2010 - 13:49:53 CST

This message: [ Message body ]
Next message: Jared Still: "Re: Database comparisons"
Previous message: Tony Adolph: "11gR2 RAC on VMWare ESXi V4.0"
In reply to: Janine Sisk: "Re: Replacement of US7ASCII character set in 11g?"
Next in thread: Janine Sisk: "Re: Replacement of US7ASCII character set in 11g?"
Reply: Janine Sisk: "Re: Replacement of US7ASCII character set in 11g?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message