Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Character Set Problems
Paul <paulwragg2323_at_hotmail.com> wrote:
>>Is is clear to you why you always get the UTF-8 sequence as two >>characters, no matter how you set NLS_LANG?
Unfortunately it is a problem, although WIN1252 is a superset of ISO8859-1,
as you corretly observe.
The problem is that if client and server character set are different,
character conversion takes place. Before there was no conversion, and
Oracle did not perform any validity checks.
Let me illustrate this with an example:
Suppose that both database and client are WE8ISO8859P1, and you store
the UTF-8 sequence like 0xe2 0x82 0xac (which is the Euro symbol).
These three bytes will be stored in the database, and since the server
character set is ISO8859-1, they will resemble three characters:
lower case a with circumflex accent, an undefined character (0x82 has no
meaning in ISO8859-1!) and the logical not symbol.
Now when you retrieve this sequence with a client that has WE8MSWIN1252 configured, these three bytes will be converted. The first and last byte will not be a problem, because as you observed, WIN-1252 is a superset of ISO8859-1. But the middle character does not correspond to any WIN-1252 character! Oracle, ill-guided, as I would say, does not give you an error, but replaces the questionable character with the "default replacement character", in that case 0xbf (inverted question mark).
You end up with the sequence 0xe2 0xbf 0xac which has no meaning (it is UNICODE 0x2fec which is undefined).
> I do not expect anybody to give me a solution, but some pointers/ideas
> on how other people would handle storing UTF8 data in a Western
> European Character Set DB would be great! I am not asking for a quick
> fix, more of a pointer so I can then go off an look into this in more
> detail to get a soltution. I just hope character sets are covered more
> in the 2nd exam!!
You can only do that by using the National Character Set. This is always a UTF-8 character set in Oracle 10g. You can then define a column als NVARCHAR2 or NCLOB and store UTF-8 data in it, even if the rest of the database has ISO8859-1.
There is no other way to store UTF-8 data in a database with a single byte character set - this is basically a contradiction in terms.
Yours,
Laurenz Albe
Received on Fri Jun 15 2007 - 08:08:08 CDT