Re: Strange Euro sign handling

From: Álvaro G. Vicario <alvaro.NOSPAMTHANX_at_demogracia.com.invalid>
Date: Mon, 13 Aug 2012 09:24:54 +0200
Message-ID: <k0aa4i$qs9$1_at_dont-email.me>



El 11/08/2012 16:49, Franz Kruse escribió/wrote:
> Dear Oracle experts,
>
> maybe one of you can explain a phenomenon we observe:
>
> We are using an Oracle 9i database on a Solaris server as part of a web
> based system with both Perl and Java programs accessing the database.
> The database uses the WEISO8859P1 (Latin-1) character set, in which the
> Euro sign does not exist.
>
> Nevertheless, the system does handle the Euro sign. We can insert Euro
> signs through web forms and get them properly displayed in web pages
> generated from database queries - as long as done through Perl programs.
> When doing queries from Java, the situation is different:
>
> - For VARCHAR fields, the ResultSet method getString() yields
> 0x0080 for a Euro sign (which is the Windows-1252 encoding).
>
> - For CLOBs (CLOB object obtained by the OracleResultSet
> method getCLOB()), the result depends on the kind of access:
>
> - When using getAsciiStream() and creating an InputStreamReader
> with encoding ISO-8859-1 from it, we get 0x00FD for a Euro sign.
>
> - When using getCharacterStream(), we get 0xFFFD (which is the
> Unicode representation of an invalid character).
>
> What does the Oracle database actually do with the Euro signs? How can
> the observed effects be explained?
>
> Franz

It looks similar to a situation I've seen many times here in Spain. Database is using Latin-1 but everyone assumes the default Windows encoding which is Win-1252 in Western Europe. Application sends ¤ as 0x80 (Win-1252) and Oracle understands PAD (Latin-1). The exact opposite happens when retrieving data and since ¤ gets displayed nobody notices the error. This explains your VARCHAR behaviour.

Since you are also getting double byte characters there must be some other encoding involved. No idea about that.

To inspect the exact column contents you can use DUMP():

SELECT DUMP('¤ÁÑ', 1016) FROM DUAL
-- Typ=96 Len=3 CharacterSet=WE8MSWIN1252: 80,c1,d1

SELECT DUMP(U'¤ÁÑ', 1016) FROM DUAL
-- Typ=96 Len=6 CharacterSet=AL16UTF16: 20,ac,0,c1,0,d1

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions046.htm#sthref1314

-- 
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
Received on Mon Aug 13 2012 - 02:24:54 CDT

Original text of this message