Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Character Set Issues
Charles Hooper wrote:
> giles.buist_at_lycos.co.uk wrote:
> > On Nov 27, 5:18 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> > > giles.bu..._at_lycos.co.uk wrote:
> > > > Hi All,
> > >
> > > > I have the following issue and would appreciate any advice on what to
> > > > do to resolve it:
> > >
> > > > When I enter the € Euro symbol into the database in a SQL statement,
> > > > I can retrieve it through SQL or the Java front end and see it as a €
> > > > Euro symbol.
> > >
> > > > If I enter the € symbol into the database through the java front end,
> > > > the java front end can retrieve it as a € Euro symbol, but SQL
> > > > retrieves it as an inverted question mark, which is a problem.
> > >
> > > > I have tried using the oracle function ASCII() to retrieve the value
> > > > for the character, the value retrieved is different depending on
> > > > whether the € Euro symbol was entered using SQL or the Java front
> > > > end.
> > >
> > > > The current nls_characterset parameter is AL32UTF8. I'm aware Oracle
> > > > recommends that 10g databases use UNICODE to cope with all characters.
> > > > Would this fix the problem? Is there any easier way?
> > >
> > > > GilesThe symbol that you posted has an ASCII value of 128. ASCII values
> > > above 127 are not standardized across different character sets/code
> > > pages. As a demonstration, if you are on a Windows platform, open
> > > Notepad, hold down the left ALT key and type 0128 on the numeric keypad
> > > - the Euro symbol will appear. Now, open a command prompt, hold down
> > > the left ALT key and type 0128 on the numeric keypad - a question mark
> > > will appear. SQLPlus for Windows may display a solid rectangle.
> > >
> > > http://www.microsoft.com/typography/EuroSymbolFAQ.mspx
> > > It appears that the euro symbol may appear in multiple positions. "To
> > > avoid confusion, the historical character U+20A0 EURO-CURRENCY SIGN has
> > > been updated with an informative note and a cross reference to U+20AC
> > > EURO SIGN"
> > >
> > > I am not sure if any of the above will be helpful in solving the
> > > problem that you are experiencing.
> > >
> > > Charles Hooper
> > > PC Support Specialist
> > > K&M Machine-Fabricating, Inc
> > What is 'the Java front end'?
> >
> > Which operating system is the client on?
> >
> >
> > What is the character set of the database?
> >
> >
> > Do you have a 10g database?
> >
> >
> > Do you have a 10g client? - Yep
> >
> >
> > What is the value of the environment variable NLS_LANG on the client?
> > ENLGISH_UNITEDKINGDOM.AL32UTF8
> >
> >
> > Which are the two values that the ASCII function returns? 14844588, the
> > other is 157something (sorry don't have it to hand right now)
> >
> >
> > Is the field containing the Euro characters a VARCHAR2, CHAR, NVARCHAR2
> >
> > or NCHAR field? VARCHAR2
> >
> > Charles - Cheers, I understand what you're saying and it prob explains
> > why determining the ASCII value for the symbol doesn't help. However,
> > ideally i simply want the € symbol to be selected by my PL/SQL and
> > placed into a file correctly. At the moment when selecting it, it
> > returns an inverted ? rather than the € symbol.
> >
> > By the way I've realised that the AL32UTF8 characterset is infact a
> > unicode compliant character set, so guessing that's not the issue.
> >
>
> I am not using a multi-byte character set here, so I can't do much
> testing. Interesting test:
> I quite frequently execute SQL statements using a program named
> SQLTalk. By holding down the left Alt key and typing 065 on the
> numeric keypad, the letter A appears (as expected). Holding down the
> left Alt key and typing 0128 on the numberic keypad causes € to
> appear. Oracle has a function named DUMP that exposes the actual byte
> values stored in a column. For example:
> SELECT
> DUMP('A')
> FROM
> DUAL;
>
> Displays the following:
> DUMP('A')
> =========
> Typ=96 Len=1: 65
>
> In SQLTalk, if I execute this statement (special character created with
> the above Alt sequence):
> SELECT
> DUMP('€')
> FROM
> DUAL;
>
> The following is returned:
> DUMP('¿')
> =========
> Typ=96 Len=1: 191
>
> On my system, Oracle is apparently storing the € character as a
> single byte with a value of 191. If I hold down the left Alt key and
> type 0191 on the numeric keypad and ¿ appears. Now, copy the above
> SQL statement to a SQL*Plus for Windows session (the € appears as a
> solid rectangle):
> SQL> SELECT
> 2 DUMP('€')
> 3 FROM
> 4 DUAL;
>
> This displays the following, which is essentially the same as with
> SQLTalk:
> DUMP('¿')
> ------------------
> Typ=96 Len=1: 191
>
> Try the same with SQL*Plus Worksheet:
> SELECT
> DUMP('€')
> FROM
> DUAL;
>
> DUMP('¿')
> -------------------
> Typ=96 Len=1: 191
>
> Still the same, now paste the SQL statement in a DOS SQL*Plus session,
> and it appears like this:
> SQL> SELECT
> 2 DUMP('?')
> 3 FROM
> 4 DUAL;
>
> DUMP('?')
> -----------------
> Typ=96 Len=1: 63
>
> More entertaining:
> SELECT
> '€'
> FROM
> DUAL;
>
> '¿'
> ================================
> ¿
>
> >From the above, at least on my database € at ASCII 128 is translated
> to ASCII 191 when stored in the database.
>
> Another quick test:
> CREATE TABLE T1 (TEST VARCHAR2(10));
>
> INSERT INTO T1 VALUES(CHR(128));
>
> SELECT
> TEST,
> DUMP(TEST)
> FROM
> T1;
>
> TEST DUMP(TEST)
> ====================
> ¿ Typ=1 Len=1: 128
>
> Notice that Oracle retained the ASCII value this time when stored in
> the column, even though it displayed the column value as an upside down
> ?.
>
> You might be able to perform translation on the client to convert the
> upside down ? back into the correct ASCII value for display, although I
> would suspect that you may have difficulty writing the correct
> character to a file. Could it be that the wrong character set is in
> use by the Oracle session for the Java front end?
>
> I believe that Tom Kyte had an interesting article that showed the dump
> values when characters are translated into different character sets by
> the database - I can't find that article at this time.
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.
I am now connected to the same database using a 10.2.0.1 client, rather than a 8.1.7 client. The default character set for Windows clients changed from WE8ISO8859P1 to WE8MSWIN1252. The above tests were performed using a 8.1.7 client. Repeating the tests with a 10.2.0.1 client:
SELECT
DUMP('€')
FROM
DUAL;
DUMP('€')
DELETE FROM T1;
INSERT INTO T1 VALUES(CHR(128));
SELECT
TEST,
DUMP(TEST)
FROM
T1;
TEST DUMP(TEST)
€ Typ=1 Len=1: 128
My initial response listed a Microsoft refence that shows U+20A0 as the
historical unicode character for the Euro currency sign, and U+20AC as
the replacement. Another test:
DELETE FROM T1;
INSERT INTO
T1
SELECT
UNISTR('\20A0')
FROM
DUAL;
INSERT INTO
T1
SELECT
UNISTR('\20AC')
FROM
DUAL;
SELECT
TEST,
DUMP(TEST)
FROM
T1;
TEST DUMP(TEST)
¿ Typ=1 Len=1: 191 € Typ=1 Len=1: 128
The second insert statement inserted the Euro currency sign, while the first caused the upside down ?. 191 looks familar from the previous tests.
The question is, what is the Java interface inserting, and what NLS character set is defined for the Java interface?
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Wed Nov 29 2006 - 09:24:21 CST