Re: Displaying 'umlaut' character

From: Jens Thoms Toerring <jt_at_toerring.de>
Date: 22 Sep 2010 07:18:28 GMT
Message-ID: <8ftou4Fnd8U1_at_mid.uni-berlin.de>



In comp.lang.perl.misc dn.perl_at_gmail.com <dn.perl_at_gmail.com> wrote:

> My aim is to display the ‘special’ (NON-Ascii) German character/
> diacritic umlaut or diaresis correctly on a browser. The browser calls
> a cgi perl-script which resides on a linux server. The browser which
> calls the perl-script displays Vietnamese characters correctly (but
> not the umlaut) without any special setting.

Stop right here. If you mean with "browser" something like firefox, Internet Explorer etc. then there's some mis-under- standing here. The browser does not "call" a cgi-script. The browser just sends a request to the server which in turn may call a cgi-script (that may be written in Perl) and then sends the results back to the browser. And a web server normally sends a HTML header with the page that may contain a line like

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

That tells the browser which type of character set to use when displaying the page it got from the server. And when the browser has the necessary fonts it will display the page correctly (otherwise some or all of the characters may be replaced by a square or something like that). (And it also may require that the web-server isn't configured to send conflicting information in the HTTP reply header..)

> The script sets NLS_LANG
> variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that’s
> about it.

> $ENV{'NLS_LANG'}='AMERICAN_AMERICA.UTF8';
> Works for Vietnamese characters, but not with umlaut (ö).

> But even before we get to a perl-script, perhaps the LC_CTYPE env
> variable needs to be set correctly.

> From my windows laptop, if I
> access Oracle through Oracle Query Server, I can see the umlaut. But
> if I open a linux-window, initiate an sqlplus session, and run the
> same SQL, I do not see the umlaut correctly. I have tried a few values
> for the env variable LC_CTYPE (like iso_8859_1, en_US,
> en_US.iso88591), but with no luck. The surprising thing is that
> ‘umalut’ is a muck-known alphabet, Vietnamese alphabets are less-
> known. Yet the Vietnamese characters are being displayed correctly.

> What settings should I use in a perl-script or for a linux-window to
> see the umlaut correctly? Please advise.

All this doesn't seem to be a Perl problem but one of how your terminal is set up. If the terminal isn't started with the correct  setting for LC_CTYPE then it won't display Unicode characters,  no matter what you set afterwards - that is only seen by programs that you start from that terminal. They then might try to output UTF-8 but the terminal doesn't know how to display them. The simplest thing probably would be to start a new terminal  with LC_CTYPE set to something reasonable, like for example  with the command

LC_CTYPE=en_US.UTF-8 xterm

Then the new xterm you started should display UTF-8 quite fine (assuming that the en_US.UTF-8 locale is installed on your machine).

To make that setting of LC_CTYPE the default you could add a line of

export LC_CTYPE=en_US.UTF8

into your .bashrc file, or to make it the system-wise default, into /etc/bash.bashrc.

Now, getting a Perl sript to deal correctly with UTF-8 is still another thing. If it takes input from files etc. it may have to indicate that it expects UTF-8 from them in the call of open(),  e.g. by using

open my $f, '<:utf8', $filename;

But that's just one point. And since you don't show any Perl code it's too hard to guess what you may need.

                               Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt_at_toerring.de
   \__________________________      http://toerring.de
Received on Wed Sep 22 2010 - 02:18:28 CDT

Original text of this message