Re: Displaying 'umlaut' character

From: Ben Morrow <ben_at_morrow.me.uk>
Date: Wed, 22 Sep 2010 07:01:31 +0100
Message-ID: <rmiom7-n45.ln1_at_osiris.mauzo.dyndns.org>


Quoth "dn.perl_at_gmail.com" <dn.perl_at_gmail.com>:
>
> My aim is to display the ‘special’ (NON-Ascii) German character/
> diacritic umlaut or diaresis correctly on a browser. The browser calls
> a cgi perl-script which resides on a linux server. The browser which
> calls the perl-script displays Vietnamese characters correctly (but
> not the umlaut) without any special setting. The script sets NLS_LANG
> variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
> about it.

You almost certainly don't want to do either of those. 'use utf8' does exactly one thing: it tells Perl your script itself is written in UTF-8. If that isn't the case you don't want to use it. Perl also doesn't take any notice of NLS_LANG or any of the other locale envvars unless you ask it to (and, normally, that's a bad idea). However, it's possible that whatever database interface you're using does.

> $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
> Works for Vietnamese characters, but not with umlaut (ö).

I don't think that's usually a valid locale on a Linux system. Usually they are of the form 'en_US.UTF-8', but in any case if you need locales at all you will want to check which locales are available on your system.

> But even before we get to a perl-script, perhaps the LC_CTYPE env
> variable needs to be set correctly. From my windows laptop, if I
> access Oracle through Oracle Query Server, I can see the umlaut. But
> if I open a linux-window, initiate an sqlplus session, and run the
> same SQL, I do not see the umlaut correctly. I have tried a few values
> for the env variable LC_CTYPE (like iso_8859_1, en_US,
> en_US.iso88591), but with no luck. The surprising thing is that
> ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
> known. Yet the Vietnamese characters are being displayed correctly.
>
> What settings should I use in a perl-script or for a linux-window to
> see the umlaut correctly? Please advise.

OK. What is actually stored in the database (what data types are you using, and how is the data encoded before being stored)? How are you getting the data out of the database (the only correct answer here is 'DBI', or possibly a wrapper around that)? Have you read the DBI and DBD::Oracle docs for anything concerning character encodings? Have you read perlunitut and the other docs that refers you to?

FWIW when I do this sort of thing I use Postgres with DBD::Pg, I set the database encoding to UTF-8 (this is a Pg-specific feature, but I wouldn't be surprised if Ora has got something similar), I push an :encoding(utf8) layer onto any filehandles, I make sure to send a 'Content-type: text/html; charset=utf-8' header, and everything Just Works. There are variations on that which work just as well, but that's by far the simplest approach.

Ben Received on Wed Sep 22 2010 - 01:01:31 CDT

Original text of this message