Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> RE: character set confusion

RE: character set confusion

From: Bobak, Mark <>
Date: Tue, 17 Jul 2007 12:40:00 -0400
Message-ID: <>

Hi Robyn,  

Playing a bit of catch up on Oracle-L.  

I'm no expert on this subject, but, here's what I (think I) know:

Converting from US7ASCII to UTF8 should not be a problem, because the latter is a superset of the former. Having a source database in UTF8 and destination database in US7ASCII may be a problem. If the UTF8 database stores characters that are not defined in US7ASCII, that's not going to be good. It seems to me, you could convert the destination database to UTF8 first, and that shouldn't be a problem. Then, when the source database is converted to UTF8 (from US7ASCII?), there's no issue. Since UTF8 is a superset of US7ASCII, having the destination at UTF8 before the source should not pose any problem.  

To confirm what can and can't be stored in various character sets, Oracle provides a tool called csscan. It may be worth investigating. Here's the link to the 10.2 csscan docs: er.htm  

Hope that helps,  



Mark J. Bobak
Senior Database Administrator, System & Product Technologies ProQuest
789 E. Eisenhower, Parkway, P.O. Box 1346 Ann Arbor MI 48106-1346
734.997.4059 or 800.521.0600 x 4059 <> <> <>

ProQuest...Start here.  

[] On Behalf Of Robyn Sent: Friday, July 13, 2007 7:50 PM
To: oracle-l
Subject: character set confusion  

Hello all,

What are the limitations of materialized views across character sets? We will be upgrading the source database for many, many materialized views to Oracle in a few months. We will also be converting the database to UTF8 although that will probably occur a few months later. The target database is, at the moment, and USASCII7. It too will be upgraded eventually but I need to determine if there is a reason to perform the upgrade simultaneously with the upgrade and/or the UTF8 conversion. Both databases have been around for many years; about a third of the objects in question still use the SNAP$ convention.

It seems logical to me that there would be the potential for the target to be unable to hold some of the data stored in the UTF8 source database, but every test I've run has worked. I did manage to hit the bug with the big endian/little endian issue but once that patch was in, no problems. I've opened a case with Oracle, but their answer was brief and not very reassuring. Supposedly, if I upgrade both databases to 10g, I won't have to worry about any differences in character sets. Somehow, that's not making sense to me and no logic was offered with the answer.

So is there some kind of conversion that occurs in the materialized view process? Or would I eventually hit some bit of data that could not be stored in the target database if it remains USASCII7? Would it make more sense to convert both to UTF8? I've got time to plan for this and I'd like to do it right, short of having to convert to completely new form of replication overnight.

Suggestions appreciated, including any test cases that might conclusively prove the possibility of failure. I'd rather find out now than at 3:00 am on Feb 23rd 2009.

tia ... Robyn

-- Received on Tue Jul 17 2007 - 11:40:00 CDT

Original text of this message