Re: Character Set Problems

From: Paul <paulwragg2323_at_hotmail.com>
Date: Fri, 15 Jun 2007 02:06:47 -0700
Message-ID: <1181898407.981291.41210@g4g2000hsf.googlegroups.com>

>Is is clear to you why you always get the UTF-8 sequence as two
>characters, no matter how you set NLS_LANG?

If I am completely honest probably not. I am only really just beginning to look at character sets. I did my 10g Administration 1 exam last week but that hardly covers it at all! As you can see, I am just beginning to really learn things. I do not expect anybody to explain this to me as it is for me learn, which I am more than willing to do - it just seems there is an awful lot to consider with character sets.

The problem I have is that I believe that the client has changed from WE8ISO8859P1, to WE8MSWIN1252 (whereas on Oracle 8 the client matched the DB server). I think that the upgrade was done by exporting from 8 and importing into 9 (which of course could have had an effect). As WE8MSWIN1252 is a superset of WE8ISO8859P1 I don't believe it would have caused a problem?

>If that were true, the program would not choke on the UTF8 sequence that
>comes out of the database or display it as two characters, right?

I agree - I have checked the data in the DB using SQL Developer and it does not look correct, which indicates it is not being stored correctly. I have not yet had time to check using the method that Carlos suggested but I will do at some point. I am convinced that something else must have changed on the customer site, not just the Oracle version from 8 to 9, but I cannot find out what very easily. I believe we could change our app to allow selection of how the XML should be encoded (this seems to work),

As all of our customers run with Western European Character Sets, and every client runs on windows, we have never had to worry about this too much. The problem has been caused by us now storing UTF8 encoded XML data. I suspect that this must have always been a problem (even with Oracle 8), or that something else has changed.

I do not expect anybody to give me a solution, but some pointers/ideas on how other people would handle storing UTF8 data in a Western European Character Set DB would be great! I am not asking for a quick fix, more of a pointer so I can then go off an look into this in more detail to get a soltution. I just hope character sets are covered more in the 2nd exam!!

Thanks,

Paul Received on Fri Jun 15 2007 - 04:06:47 CDT