Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: mod_plsql / dad / character set problem

Re: mod_plsql / dad / character set problem

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: 18 Jul 2006 08:12:09 GMT
Message-ID: <1153210327.438997@proxy.dienste.wien.at>


Frank van Bortel <frank.van.bortel_at_gmail.com> wrote:

>> This is a bug in Oracle, because 0x80 has no meaning in WE8ISO8859P1.
>> Basically the database allows you to store an invalid character.

>
> Beg to differ - if it's a bug, it's a bug in Windows.
> Or rather: it's a bug called politicians. European legislation
> was so dreadfully slow defining code points, Microsoft decided
> on their own (hey - haven't we seen them do that more often?)
> to use 0x080. I'd say Win2k introduction had to do with this, but
> am just guessing here. Oracle waited and waited and got bad press
> for being so slow in their implementation of the Euro symbol, but
> at least Oracle uses the official code point.
>
> It is not by coincidence, the character set is called we8MSWIN...

I am afraid that I will have to contradict you.

Unfortunately I will also have to exculpate Microsoft, much as I resent it:

I was talking about ISO 8859-1 and not Windows 1252. In Windows 1252 the character 0x80 is well defined, it is the Euro sign. So if your database is WE8MSWIN1252, and you store a 0x80 in it, everything is hunky dory.

The problem with Windows 1252 is not that it is not well defined, the biggest problem is that Microsoft included certain exotic characters that no other single byte character set has. Microsoft's text processing program aggressively uses these characters, with the consequence that a text file produced on a Windows system will most likely contain characters that cause problems on any other system.

But that is beside the point here.

In ISO 8859-1 the character 0x80 is undefined, left empty, it is an illegal character.

The problem - that I was so bold to call a bug - is that when Oracle uses a single byte character set and does NOT do character conversion (because client and server character set are identical), it will NOT check if a character is legal or not, it will just accept any byte as a valid character.

You will not detect this problem unless you try to retrieve that character at a later time when client and server character set are different. THEN Oracle will detect the invalid character and will deliver garbage.

This is not the first time that I have encountered this very problem, and I think that Oracle should fix this as it can cause major headaches later on.

IF you have consistently been storing Windows 1252 characters in an ISO 8859-1 database, the solution would be to just change the database character set without changing the data (if possible).

Yours,
Laurenz Albe Received on Tue Jul 18 2006 - 03:12:09 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US