Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Database Character set related question

RE: Database Character set related question

From: Justin Cave (DDBC) <jcave_at_ddbcinc.com>
Date: Fri, 28 Oct 2005 13:52:26 -0600
Message-ID: <8769EA99D658784EA3CA652F2C93C63EC322@EXCHANGE.ddbc.local>


Having 8-bit values in CHAR/ VARCHAR2 columns in a database whose character set us US7ASCII is not supported by Oracle and can cause significant problems, including irretrievably corrupt data.  

If the client's NLS_LANG and the server's character set match, Oracle currently bypasses the character set conversion step at the network level (it has been threatened that this will change in the future) and trusts that the data being sent is, in fact, the declared character set. If every application that accesses the data claims that the data is US7ASCII but treats it internally as some other character set, it can appear that things are working correctly. If any application or client machine deviates from that falsehood, very bad things happen. If you have two different applications that use different character sets internally (i.e. one is using UTF8 and the other is using Windows-1252), you can have a situation where half of your data is encoded using UTF-8 and the other half is encoded in Windows-1252 and Oracle has no way of knowing what values are encoded in what character set. Untangling data that has been corrupted this way is a challenge to say the least.  

You cannot use export & import to move data to a new machine with a different character set. If you do, Oracle will do a character set conversion at some point and corrupt your data, since it will assume that the data is valid US7ASCII and the conversion will fail for values
> 128.
 

The first step is to figure out what character set the data in the database really is. You can do this either by looking at the application(s) that connect to the database and seeing what character set they use internally or by examining the data. The quick & dirty approach is to extract the data (or at least representative subsets) to a flat file (ensuring that you do not do a character set conversion) and try to open the file in Word, which will hopefully prompt you to choose a character set to use to display the data. Play around until you find one that works (probably Windows-1252, ISO-8859-1, ISO-8859-15, or UTF-8). Note that many of these character sets are very similar, so you have to be careful about jumping to conclusions. If there are multiple character sets represented, you've got a major problem.  

If you follow the general instructions for character set conversion (i.e. ALTER DATABASE CHARACTER SET), Oracle will allow you to change the character set to just about anything, but that won't change the data that is being stored and will almost certainly cause massive amounts of things to break, particularly if you pick the wrong character set. If you pick the right character set (the character set that the data is really encoded in), you'll just need to adjust NLS_LANG settings on all the client machines. If you pick the wrong character set, conversion for the code points > 127 will universally fail regardless of client settings.  

Justin Cave

Distributed Database Consulting, Inc.  


From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mandal, Ashoke Sent: Thursday, October 27, 2005 12:42 PM To: oracle-l_at_freelists.org
Subject: Database Character set related question  

Greetings All,

The database CHARACTER SET is US7ASCII. The data in the database contains 8 bits (> 127 ASCII ) values. We cannot display these values, instead we see "?" character via the sqlplus session. We want to upgrade this database to 9.2.0.4, but before we do that , we want to make sure that we don't lose any characters that are currently stored in the database.
 We are attempting to run CSSCAN to verify. We want to use UTF8 in the new 9i
database. How do we determine what the character set was used to put the data into the database.

It's definitely not US7ASCII!

Is there a object that we can look at to see what another session is currently using? It is difficult for us to get access to the user 's workstation to see there configuration.

Thanks,

Ashoke

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Oct 28 2005 - 14:55:10 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US