charset problem update

From: Janine Sisk <janine_at_furfly.net>
Date: Wed, 20 Jul 2005 01:35:19 -0700
Message-Id: <7dbda697bf0c9c5796e96f06914bb4e9@furfly.net>

Thanks to everyone who e-mailed me with suggestions and info from metalink and other sources. It was all good stuff, but ultimately not the right stuff. I tried various scenarios:

export from old database with NLS_LANG set to US7ASCII, import to new database with NLS_LANG set to USASCII, let imp do the conversion
export from old database with NLS_LANG set to WE8ISO8859P1, import to new database with NLS_LANG set to WE8ISO8859P1, let exp do the conversion

but nothing worked. I ultimately found a white paper from Oracle (this is the HTML version of their PDF, from google):

http://66.102.7.104/search?q=cache:1shB3SfIDNkJ:www.oracle.com/ technology/tech/globalization/pdf/
TWP_Character_Set_Migration_Best_Practices_10gR2.pdf+oracle+export+us7as cii++8-bit&hl=en&start=1&client=firefox-a

which talks about what a big job this is. As far as I can tell, the problem is that Oracle lets you store 8-bit characters with a 7-bit character setting, but when you do any sort of conversion it forgets about that and drops the 8th bit.

I then tried a different approach. I set up a second database on the 8i system, still in US7ASCII. I loaded my data into it, and verified that this time it loaded and displayed correctly. Then I tried changing the character set of the database. I thought this might perhaps use different conversion code and maybe it was done correctly, but no joy here either. I first tried WE8ISO8859P1, then I tried WE8MSWIN1252, which is specifically for those funky characters from MS Word, and I even tried UTF8. The first two gave the same result as my imp/exp trials, which is corruption of all the 8-bit characters, and the last corrupted the data completely so all I got was a row of question marks and nothing readable at all. And yes, I was creating the database over from scratch in US7ASCII and reloading the data each time (it's been a long night.... :)

I need to read that conversion doc from Oracle in more detail, but what I got from skimming over it was basically that one should use tools to remove "problem" data before doing the conversion. In this case it would be the same data that's getting corrupted. I'm not getting the impression that this data can be saved....

One thing just occurred to me. If I set up a second database on the 9i system in US7ASCII and loaded the dump from the 8i system, and connected to it from our application with NLS_LANG set to US7ASCII, I would probably get good data. The problem here is that I have no way to set up a connection to the target database at the same time with a different NLS_LANG value; they will both use the same environment variable and I know of no way to override it.. Drat, I thought I was on to something for a second there! :)

If I'm missing, overlooking or otherwise misunderstanding something, please let me know. At this point the alternatives are looking a bit bleak.....

janine

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Jul 20 2005 - 03:37:21 CDT