Re: NLS_LANG

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: 24 Jul 2008 09:14:40 GMT
Message-ID: <1216890876.195061@proxy.dienste.wien.at>


"Álvaro G. Vicario" <alvaroNOSPAMTHANKS_at_demogracia.com> wrote:
>>> SPANISH_SPAIN.WE8MSWIN1252 (the Windows codepage)
>>> SPANISH_SPAIN.WE8ISO8859P1 (ISO-Latin-1, used in first site)
>>> SPANISH_SPAIN.WE8ISO8859P15 (ISO-Latin-9, used in second site)
>>>
>>> From the "Oracle Database Globalization Support Guide" I understand
>>> that NLS_LANG makes Oracle perform a charset conversion. Since both
>>> sites use different charsets and I can convert in my PHP app when
>>> necessary, I think it'd be a good idea that Oracle does not make any
>>> conversion. But I don't know how to find out the charset that database
>>> tables are using natively. How could I find out? Does all this make any
>>> sense?

>> 
>> You could find out by querying NLS_DATABASE_PARAMETERS.
> 
> A query to that table shows that NLS_CHARACTERSET is WE8ISO8859P1 so 
> I'll work on that base.
> 
>> However, for Windows the default is MSWIN1252.
>> WE8ISO8859P1 does not contain the euro sign.
> 
> You are right (I've seen apps that use the currency symbol instead). 
> Luckily it's not really an issue in web sites since you can always use 
> the &euro; entity--I just need to know what charset the data is actually 
> using.

If you really plan to use WE8ISO8859P1 on the Windows client, please write your application so that it makes dead sure that only LATIN-1 characters are entered into the database.

Oracle has a special "feature" in that it will not check your input for validity if client and server character sets are the same. This means that your client application can enter arbitrary junk into the database that will be happily stored as is.

Let me give you an example.
If the PHP application decides to store an Euro sign in the database and it happens to run in the Windows codepage, it will try to store a byte 0x80. Oracle will happily store that byte, even though this byte does not make any sense in LATIN-1. You will not notice that until you try to access the database with a different client character set, e.g. from Java. Then Oracle will return garbage instead of the Euro signs, and there is no way to fix that.

But even if you plan to use a different character set on the Windows client (which I personally would do, namely WE8MSWIN1252), you will not get error messages if invalid characters are stored. Oracle will check the characters, but if it detects - say - a Euro sign, it will not throw an error but clandestinely convert the character to a question mark.

The best solution in my opinion would be to recreate the database using the character set AL32UTF8 and on the client either use WE8MSWIN1252 or AL32UTF8 (if your PHP code speaks UTF-8).

Yours,
Laurenz Albe Received on Thu Jul 24 2008 - 04:14:40 CDT

Original text of this message