Re: Replacement of US7ASCII character set in 11g?

From: Janine Sisk <janine_at_furfly.net>
Date: Mon, 11 Jan 2010 14:14:49 -0800
Message-Id: <2C9EB49F-7205-4057-81C2-E9AF60ECD5C5_at_furfly.net>



Mark, thanks for that! You gave me the idea to run CSSCAN and tell it to convert the data from WE8MSWIN1252 to UTF8. Voila, no more errors. At least now I know what Oracle thought it was doing when this data was inserted.

So (I think) what I need to do is get the data out of 8.1.7, but have Oracle treat it as WE8MSWIN1252 as-is, without converting it. If it tries to convert it, I'll probably lose all those 128 and above characters.

I am going to try experimenting with this, but if anyone has any educated guesses I'd love to hear them.

Also, sadly I no longer have Metalink/MOS access; I am doing this as an outside contractor to one tiny piece of a huge organization, and I am not allowed to use their CSI. I used to have my own, but over the years my smaller clients have all converted to Postgres; our database usage is fairly minimal and doesn't justify the licensing expense. So, no more Metalink for me. :(

janine

On Jan 11, 2010, at 11:49 AM, Bobak, Mark wrote:

> Hi Janine,
>
> Problem is that “LOSSY” may mean that the data is already corrupted
> in the source database, so, Oracle has no idea what to convert it
> to. In the case of US7ASCII, character values range from 0-127,
> correct? So, if you have any values 128 or greater, Oracle has no
> idea what character that value should map to. It would depend on
> what *assumption* the client code was operating under when the data
> was inserted. So, it doesn’t matter if AL32UTF8 is the “mother of
> all character sets”. Even if AL32UTF8 *does* have the character you
> need, it doesn’t matter, cause Oracle doesn’t know what that
> character is in the source. If it can’t determine that, it can’t
> map it to the correct character in the destination character set.
>
> See Doc ID 444701.1 “CSSCAN Output Explained”, for more info,
> particularly “B.4) LOSSY data”.
>
> Hope that helps,
>
> -Mark
>
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org
> ] On Behalf Of Janine Sisk
> Sent: Monday, January 11, 2010 2:37 PM
> To: oracle-l L
> Subject: Re: Replacement of US7ASCII character set in 11g?
>
> Thanks to everyone who replied....
>
> I ran CSSCAN on the original 8.1.7 database and, of course, ran into
> trouble right away. The conversion from US7ASCII to WE8MSWIN1252 is
> lossy in a number of places. This does not surprise me terribly;
> Jared mentioned that you can put "invalid" data into a database of
> type US7ASCII and I'm pretty sure that all of the programmers who
> have worked on this site over the years have just assumed that if
> the database didn't choke on it, then it was ok.
>
> What concerns me is that CSSCAN reports that converting to UTF8 will
> have the exact same lossy conversions. The two error files are
> literally identical except for the value of TOCHAR. I thought that
> UTF8 was the mother of all character sets, so where do I go from here?
>
> As an example, one of the first errors reported looks like this when
> I do a SELECT in sqlplus:
>
> Lic. en medios de comunicaci&Atilde;?3n
>
> I will be digging further into this, with The Google and all, but if
> anyone has any light to shed, please do!
>
> janine
>
> On Jan 7, 2010, at 3:07 PM, David Mann wrote:
>
>
> When I had clients worried about character set conversions I usually
> ran Character Set Scanner utility CSSCAN on a copy of the database
> to check for differences. Here is the reference in the 10g
> documentation, assuming it is still available in 11g but don't have
> a link handy.
>
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm
>
> --
> Dave Mann
> www.brainio.us
> www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml
>

---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407





--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 11 2010 - 16:14:49 CST

Original text of this message