Re: Language Character Sets

From: Peter J. Holzer <hjp-usenet_at_SiKitu.wsr.ac.at>
Date: Wed, 24 Jan 2001 15:35:36 +0100
Message-ID: <slrn96tq1o.1gn.hjp-usenet@teal.h.hjp.at>

On 2001-01-23 21:56, Howard J. Rogers <howardjr_at_www.com> wrote:
>Use unicode. Makes these sorts of worries utterly obsolete, at the
>slight cost of making all your French (and other Latin-based) data
>twice as long as it otherwise would be.

If by "Unicode" you mean UTF8, that's not true. UTF8 is a variable length encoding - a character can take up 1, 2, or 3 bytes. For French (and other European languages which use the latin alphabet with only a few accents here and there) the average number of bytes per character would be only a little over 1. For other letter-alphabets - like cyrillic or arabic - it should be a little under 2, and for ideographic alphabets about 3.

-- 
   _  | Peter J. Holzer    | All Linux applications run on Solaris,
|_|_) | Sysadmin WSR       | which is our implementation of Linux.
| |   | hjp_at_wsr.ac.at      | 
__/   | http://www.hjp.at/ |	-- Scott McNealy, Dec. 2000

Received on Wed Jan 24 2001 - 08:35:36 CST