RE: Characterset Conversion

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 22 Oct 2014 17:24:02 -0400
Message-ID: <085d01cfee3e$81561880$84024980$_at_rsiz.com>

Excellent points. So it IS odd, but you’ve cogently explained exactly how it can get that way. I remember this pass-thru optimization and dimly remember that you can mis-configure a client.

Thanks Mark

From: MARK BRINSMEAD [mailto:mark.brinsmead_at_gmail.com] Sent: Wednesday, October 22, 2014 4:47 PM To: mwf_at_rsiz.com
Cc: srcdco_at_rit.edu; ORACLE-L
Subject: Re: Characterset Conversion

It can be lossy. If the data stored is not actually US7ASCII.

Consider the case where the database is configured for US7ASCII, and all clients set NLS something like NLS_LANG=america.american.us7ascii.

Once the database and (all of) the applications attest that they are using the same character set, NLS pass any and all data in both directions without attempting to validate or to perform characterset conversion.

Now consider that the applications -- running on Windows, for illustration -- are actually using a characterset like WE8ISO8859P3. Because they are lying to the database, and swearing that the data is really US7ASCI, the data can flow in both directions unchanged, and everybody is happy.

Everybody is happy, that is, right until somebody decides they need to CHANGE the database characterset, and suddenly discovers that a 30 TB database that is supposed to be full of US7ASCII data is actually, in fact, full of something else entirely.

And now, here we are, having this conversation. :-)

This is exactly the sort of thing that makes a "casual" approach to characterset conversions risky. (That's an observation, not an accusation.) The data may not really be what you think it is. I have even seen cases where people have encoded 64-bit binary values (IEEE floating point numbers or something similar) in VARCHAR2(8) columns. In situations where the data is not really what you (or the database) thinks it is, the conversion process can be fraught with peril.

On Wed, Oct 22, 2014 at 4:28 PM, Mark W. Farnham <mwf_at_rsiz.com> wrote:

I find it a bit odd that US7ASCII to AL32UTF8 is not loss-less.

What characters is it saying you will lose?

mwf

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Scott Canaan Sent: Wednesday, October 22, 2014 4:23 PM To: oracle-l_at_freelists.org
Subject: Characterset Conversion

I have a database that has to be converted from US7ASCII to AL32UTF8 before the application can be upgraded. I tried using the CSSCAN tool and it told me that I can’t do it that way because data would be lost. I’m trying to figure out how to run the dmu tool, but the instructions that I have from the vendor are all for Windows servers and we run in Red Hat 6, so it isn’t working. The database version is 11.2.0.3. Has anyone fought through this (as it appears to not be a simple task)? If so, can you give me some pointers on how to make it work? Or is there a simpler way to do this?

Thank you,

Scott Canaan ’88 (srcdco_at_rit.edu)

(585) 475-7886 <tel:%28585%29%20475-7886> – work

“Life is like a sewer, what you get out of it depends on what you put into it.” – Tom Lehrer

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Oct 22 2014 - 23:24:02 CEST

This message: [ Message body ]
Next message: William Muriithi: "Re: Symantec ODM or ASM"
Previous message: Chris Taylor: "Re: Characterset Conversion"
Maybe in reply to: Scott Canaan: "Characterset Conversion"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message