Character set conversion (re Metalink notes 260192.1 and 225938.1) - Why?

From: Charles Schultz <sacrophyte_at_gmail.com>
Date: Wed, 22 Oct 2008 09:10:09 -0500
Message-ID: <7b8774110810220710n3741c028ocecbd515c68aba8a@mail.gmail.com>


Good day list,

I filed this question with Oracle Support, but they have been slow in responding.

Metalink note 260192.1 (*Changing WE8ISO8859P1/ WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8*) says:
If there is any "Lossy" data then those rows contain code points that are not currently stored correctly and they should be cleared up before you can continue with the steps in this note. Please see the following note for clearing up any "Lossy" data: Note
225938.1<http://metalink/plsql/showdoc?db=NOT&id=225938.1&blackframe=1>Database Character Set Healthcheck

The thought is that in order to resolve lossy data when converting from a WE8ISO8859P1 (aka ISO) character set to a (AL32)UTF8 character set, one first has to convert to WE8MSWIN1252 (aka, WIN) (the superset of WE8ISO8859P1). Why?

I know the reasons that Oracle gives via the many Metalink notes (341676.1,264294.1,260192.1,252352.1,225938.1,158577.1,123670.1,225912.1). They boil down to the fact that the ISO set does not contain as many codepoints as the WIN set. While this is true, it is also true that the WIN set has *all* the codepoints that the ISO set does (hence the definition of a superset).

So in essence, when one converts the character set from ISO to WIN, the codepoints never change. If you store an "unsupported" character, say, the Euro (128) or left double quote (147) in an ISO database, the ascii value will remain the same (128 and 147 respectively) when you "convert".

But why? If I export my data from a WE8ISO8859P1 database with NLS_LANG = WE8MSWIN1252 and then turn around and import that into a AL32UTF8 database, all my lossy data is automagically converted. And so it should be. So why does Oracle documentation tell me that I have to clean up the lossy data first, when a much easier and foolproof method is available? What am I missing?

Simple example

Source: OSEE 10.2.0.2 (EXPRT)
Target: OSEE 10.2.0.3 (PUMP)

Source:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

SQL> select VALUE from nls_database_parameters where PARAMETER = 'NLS_CHARACTERSET'; VALUE



WE8ISO8859P1 SQL> create table oracle.lossy_test (a char(1));

Table created.

SQL> insert into oracle.lossy_test values (chr(128));

1 row created.

$> expdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=EXPRT_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST

Target:
$> export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252

SQL > select VALUE from nls_database_parameters where PARAMETER = 'NLS_CHARACTERSET'; VALUE



AL32UTF8 SQL > show parameter seman
NAME                                 TYPE     VALUE
------------------------------------ -------- ------------------------------
nls_length_semantics                 string   CHAR

SQL > create table oracle.lossy_test (a char(1 CHAR));

Table created.

PUMP_SQL > select * from oracle.lossy_test;

no rows selected

$> impdp directory=DPUMP dumpfile=EXPRT_lossy_test.dmp
logfile=PUMP_lossy_test.log metrics=y TABLES=ORACLE.LOSSY_TEST TABLE_EXISTS_ACTION=TRUNCATE With a windows client (ie, sqlplusw), I can see that the Euro shows up just fine in both databases.

-- 
Charles Schultz

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 22 2008 - 09:10:09 CDT

Original text of this message