Character set conversion (re Metalink notes 260192.1 and 225938.1) - Why?
Date: Wed, 22 Oct 2008 09:10:09 -0500
Message-ID: <7b8774110810220710n3741c028ocecbd515c68aba8a@mail.gmail.com>
Good day list,
I filed this question with Oracle Support, but they have been slow in responding.
Metalink note 260192.1 (*Changing WE8ISO8859P1/ WE8ISO8859P15 or
WE8MSWIN1252 to (AL32)UTF8*) says:
If there is any "Lossy" data then those rows contain code points that are
not currently stored correctly and they should be cleared up before you can
continue with the steps in this note. Please see the following note for
clearing up any "Lossy" data: Note
225938.1<http://metalink/plsql/showdoc?db=NOT&id=225938.1&blackframe=1>Database
Character Set Healthcheck
The thought is that in order to resolve lossy data when converting from a WE8ISO8859P1 (aka ISO) character set to a (AL32)UTF8 character set, one first has to convert to WE8MSWIN1252 (aka, WIN) (the superset of WE8ISO8859P1). Why?
I know the reasons that Oracle gives via the many Metalink notes (341676.1,264294.1,260192.1,252352.1,225938.1,158577.1,123670.1,225912.1). They boil down to the fact that the ISO set does not contain as many codepoints as the WIN set. While this is true, it is also true that the WIN set has *all* the codepoints that the ISO set does (hence the definition of a superset).
So in essence, when one converts the character set from ISO to WIN, the codepoints never change. If you store an "unsupported" character, say, the Euro (128) or left double quote (147) in an ISO database, the ascii value will remain the same (128 and 147 respectively) when you "convert".
But why? If I export my data from a WE8ISO8859P1 database with NLS_LANG = WE8MSWIN1252 and then turn around and import that into a AL32UTF8 database, all my lossy data is automagically converted. And so it should be. So why does Oracle documentation tell me that I have to clean up the lossy data first, when a much easier and foolproof method is available? What am I missing?
Simple example
Source: OSEE 10.2.0.2 (EXPRT)
Target: OSEE 10.2.0.3 (PUMP)
Source:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
SQL> select VALUE from nls_database_parameters where PARAMETER = 'NLS_CHARACTERSET'; VALUE
WE8ISO8859P1 SQL> create table oracle.lossy_test (a char(1));
Table created.
SQL> insert into oracle.lossy_test values (chr(128));
1 row created.
$> expdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=EXPRT_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST
Target:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
SQL > select VALUE from nls_database_parameters where PARAMETER = 'NLS_CHARACTERSET'; VALUE
AL32UTF8 SQL > show parameter seman
NAME TYPE VALUE ------------------------------------ -------- ------------------------------ nls_length_semantics string CHAR
SQL > create table oracle.lossy_test (a char(1 CHAR));
Table created.
PUMP_SQL > select * from oracle.lossy_test;
no rows selected
$> impdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=PUMP_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST
TABLE_EXISTS_ACTION=TRUNCATE
With a windows client (ie, sqlplusw), I can see that the Euro shows up just
fine in both databases.
-- Charles Schultz -- http://www.freelists.org/webpage/oracle-lReceived on Wed Oct 22 2008 - 09:10:09 CDT