Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> RE: character set confusion

RE: character set confusion

From: Powell, Mark D <>
Date: Tue, 17 Jul 2007 14:10:57 -0400
Message-ID: <>

I thought UTF8 should be considered obsolete as it is not guaranteed to match the emerging standard and that AL32UTF8 was its replacement.  

[] On Behalf Of Bobak, Mark

	Sent: Tuesday, July 17, 2007 12:40 PM
	To:; oracle-l
	Subject: RE: character set confusion

	Hi Robyn,


	Playing a bit of catch up on Oracle-L.


	I'm no expert on this subject, but, here's what I (think I)

        Converting from US7ASCII to UTF8 should not be a problem, because the latter is a superset of the former. Having a source database in UTF8 and destination database in US7ASCII may be a problem. If the UTF8 database stores characters that are not defined in US7ASCII, that's not going to be good. It seems to me, you could convert the destination database to UTF8 first, and that shouldn't be a problem. Then, when the source database is converted to UTF8 (from US7ASCII?), there's no issue. Since UTF8 is a superset of US7ASCII, having the destination at UTF8 before the source should not pose any problem.          

        To confirm what can and can't be stored in various character sets, Oracle provides a tool called csscan. It may be worth investigating. Here's the link to the 10.2 csscan docs:  er.htm          

        Hope that helps,          


	Mark J. Bobak
	Senior Database Administrator, System & Product Technologies
	789 E. Eisenhower, Parkway, P.O. Box 1346
	Ann Arbor MI 48106-1346
	734.997.4059  or 800.521.0600 x 4059 <> <> <> 
	ProQuest...Start here. 



[] On Behalf Of Robyn
Sent: Friday, July 13, 2007 7:50 PM To: oracle-l Subject: character set confusion Hello all, What are the limitations of materialized views across character
sets? We will be upgrading the source database for many, many materialized views to Oracle in a few months. We will also be converting the database to UTF8 although that will probably occur a few months later. The target database is, at the moment, and USASCII7. It too will be upgraded eventually but I need to determine if there is a reason to perform the upgrade simultaneously with the upgrade and/or the UTF8 conversion. Both databases have been around for many years; about a third of the objects in question still use the SNAP$ convention.         

        It seems logical to me that there would be the potential for the target to be unable to hold some of the data stored in the UTF8 source database, but every test I've run has worked. I did manage to hit the bug with the big endian/little endian issue but once that patch was in, no problems. I've opened a case with Oracle, but their answer was brief and not very reassuring. Supposedly, if I upgrade both databases to 10g, I won't have to worry about any differences in character sets. Somehow, that's not making sense to me and no logic was offered with the answer.         

        So is there some kind of conversion that occurs in the materialized view process? Or would I eventually hit some bit of data that could not be stored in the target database if it remains USASCII7? Would it make more sense to convert both to UTF8? I've got time to plan for this and I'd like to do it right, short of having to convert to completely new form of replication overnight.         

        Suggestions appreciated, including any test cases that might conclusively prove the possibility of failure. I'd rather find out now than at 3:00 am on Feb 23rd 2009.         

        tia ... Robyn                  

Received on Tue Jul 17 2007 - 13:10:57 CDT

Original text of this message