From: Laurenz Albe <>
Date: Mon, 8 Jun 2009 10:05:49 +0200
Message-ID: <>

I feel like replying to this, but let's not get religious about it - I know that these things are to some extent a matter of taste.

Frank van Bortel wrote:
>>> A client pretending to use the same character set as the database,
>>> can store quite some rubbish.
> No they cannot. And as long as path to equals path from,
> no one is going to see any changes.
> If you want to store the character 'A', and you do it by
> translating into something you do not recognize, and get it back, and it
> is still represented as 'A', how can that be wrong?!?

That largely depends on what your definition of rubbish is. My personal answer is "data that are not correct are rubbish". My definition of "correct" in this context will be given below.

To your question of "how can it be wrong": My feeling is that if somebody happily sits on a time bomb and currently has no problems, I should still call that "wrong". Or to be more picturesque, I could call it shit suspended above the fan.

New requirements come up quickly in the business, and as soon as somebody decides that this database should be - for example - accessed with a JDBC client, you have a very real problem.

> Somethimes, the thing you stored (and did not recognize) is
> encrypted. Sometimes, a client with another nls setting than
> what you *expected* entered the character.
> Think EBCDIC, for a change.

I think I understand your point, and I agree that it is a valid one.

When musing about this problem class I figured that there are basically two philosophies:

  1. Character data stored in a database should always be "correct" (see below for my definition), and it is the responsibility of the database software and the people who maintain and configure it to keep it that way.
  2. A database is a byte repository and should not care about semantics. Its job is to reliably reproduce whatever was fed into it. It is the job of the application to interpret and maybe convert the data.

You seem to adhere to the second line of thinking, while I prefer the first. I admit that the second paradigm is certainly much more convenient for the DBA.

>> - They do not verify correctness of the data when client
>> encoding is equal to serve encoding.
> Eh? data has nothing to do with nls_lang. Besides, define correct.

The character set part of NLS_LANG influences the data stored in the database.

Ok, and here is my definition of correct: A character stored in a database is correct if and only if the user (or client application) and the database software agree on its meaning.

I admit that I have been sloppy in what I wrote: the database software cannot verify correctness of data in that sense.

What I wanted to say is "verify that the bytes fed to the client are correct in the client encoding".

For example:
[Quoted] Both Note 14786.1 and Note 341676.1 agree that code point hex 0x80 is not assigned in the WE8ISO8859P1 character set. Yet Oracle will happily accept this byte when you feed it to the client.

This is what I complain about.

>> - They do not report an error when a character conversion
>> between different encodings fails, but silently store
>> bad data ("replacement characters").
> It does not fail - it simply ends up in another character code.
> Other dimension of same world.

Perhaps I'm spoiled by other software, but I get a very bad feeling when I store something in a database, get no error or warning, and yet the database mutilates the strings I want to store so that I cannot get my data back.

>> This all comes together with a third bug, namely that
>> (at least up to Oracle 8) Windows clients were installed
>> with the wrong NLS_LANG setting (WE8ISO8859P1) by default.
>> All this resulted in lots of corrupt databases.
> Perhaps in some code points being off - not corrupt databases.
> Corrupt is too large a word here.

I agree. I used this overly strong word for emphasis.

Laurenz Albe Received on Mon Jun 08 2009 - 10:05:49 CEST

Original text of this message