RE: UTF character set application problem

From: Marc Perkowitz <mperkowitz_at_comcast.net>
Date: Tue, 28 Sep 2004 17:12:09 -0500
Message-ID: <007f01c4a5a8$39530f50$34099943@MTPSYS>

This is a follow up on what we found. There was some confusion and misunderstanding on what was actually happening. It turns out the data = IS
being translated and stored correctly in the Oracle UTF8 database using additional bytes for the characters that need them. We were not viewing them correctly and also did not increase the column sizes to allow for = the
additional bytes. Once we increased the size and set the NLS_LANG correctly, everything was fine.

Thank you Justin and Mike for hints that lead to us finding this out.

Marc Perkowitz.

-----Original Message-----

From: Justin Cave [mailto:justin_at_askddbc.com]=20 Sent: Monday, September 20, 2004 3:17 AM To: mperkowitz_at_comcast.net; oracle-l_at_freelists.org Subject: RE: UTF character set application problem

First off, this is decidedly not the way things should work. Oracle = should
be converting the data automatically.

If you run the Oracle character set scanner on the Western European database, does it complain? As Mike Vergara points out, it is possible = to
get improperly encoded data into a database if you set the client = NLS_LANG
the same as the database character set. If that has happened here, it = would
explain why Oracle is unable to do the conversion automatically.=20

Justin Cave
Distributed Database Consulting, Inc.
http://www.ddbcinc.com/askDDBC

-----Original Message-----

From: oracle-l-bounce_at_freelists.org =
[mailto:oracle-l-bounce_at_freelists.org]
On Behalf Of mperkowitz_at_comcast.net
Sent: Friday, September 17, 2004 7:16 PM To: oracle-l_at_freelists.org
Subject: UTF character set application problem

We're having a problem with character sets. Recently we switched our database to UTF and now we have problems with names containing accented characters, etc. generating errors when we are trying to insert them =3D = into
the database.

The data originates from a database that uses Western European character set. We expected that UTF being a superset, there would be no problems = =3D
with switching. However, after a lot of testing, we found that UTF is = not
compatible with WE characters. If the data originates as WE, you must either store it in a UTF database or do an explicit translation to UTF.

This is counter-intuitive to me, but it is my first experience with =3D = using
different character sets.

The application is in Java using thin JDBC drivers and no =3D = Oracle-specific
functions. We created a very simple test program to prove out this =3D finding.
We've tested this on 9iR1, 9iR2, and 8i and it works the same.

Anyone else encounter this? Is it just my misconceptions on this in the first place? Or have I overlooked something?

Thanks,
Marc Perkowitz.

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Tue Sep 28 2004 - 17:07:57 CDT