Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Character Set Problems

Re: Character Set Problems

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: 14 Jun 2007 13:37:15 GMT
Message-ID: <1181828226.580263@proxy.dienste.wien.at>


Paul <paulwragg2323_at_hotmail.com> wrote:
> 1) The Database Server is using the WE8ISO8859P1 character set. The
> client setting for the server is the same.
> 2) We have an application that generates XML data, based on various
> things, and stores this data into a LONG column.
> 3) One particular part of this is that step 2 reads the '°C'
> characters from a table in the DB.
> 4) When the XML data is generated within our application, it is
> encoded using the UTF8 format. The '°C' characters become '°C' which
> I believe is correct as the UTF representation of the '°' character is
> '°'. I have used a hex editor to check the representation of these
> characters, and all seems to be well. I think everything is stored
> correctly. I also believe that as the database and client have the
> same character set setting, no conversion is done, and so the data is
> stored in UTF8 format.
> 5) Another application comes along and reads the data in. The data is
> being rejected, as it does not know what the '°' is.
> 6) I suspect that this 2nd application needs to know that the data is
> stored in UTF8 format.
> 7) I have tried outputting the data in SQL Plus using various NLS_LANG
> settings, but I always get the 'Â' character displayed.

As has been stated, your data are not stored correctly.

By exploiting the fact that no character conversion is made when client and server character set are identical and that no validity checks are made in this case, you exploit what I call a bug in Oracle to get corrupt data into the database.

The only way to get the data out the way you want is to set the client character set to WE8ISO8859P1 with NLS_LANG.

Of course, this will not work with - e.g. - a Java program that accesses the database, as Java uses UNICODE and will always try to convert your data.

The correct thing to do is to change the database to UTF8 with

ALTER DATABASE CHARACTER SET AL32UTF8 and change the client encoding appropriately (AL32UTF8 for the program that stores the XML files, and the appropriate value for the program that retrieves the data).

Which character set does the retrieving program expect?

Yours,
Laurenz Albe Received on Thu Jun 14 2007 - 08:37:15 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US