Re: Urgent: how I can see which character set the file is used

From: Yong Huang <yong321_at_yahoo.com>
Date: 7 Mar 2003 21:51:11 -0800
Message-ID: <b3cb12d6.0303072151.4837bd74@posting.google.com>

Kalle <kminerva_at_jippii.fi> wrote in message news:<3E66DBA4.E8A7F35F_at_jippii.fi>...
> Hi all,
>
> I have an urgent problem, how could I see which character set is used in
> certain flat file on UNIX level.
>
> I need to know this in order to setup the value correctly into the
> control file...
>
> Any ideas would be apreciated
>
> Rgds,
> Kalle

Hi, Kalle,

One tool is David Necas (Yeti)'s Enca
(http://trific.ath.cx/software/enca/). But he told me it only works well in detecting "encodings for Belarussian, Czech, Polish, Russian, Slovak and Ukrainian, and the few multibyte encodings it detects are Unicode variants like UTF-8". It has Linux and BSD binaries but I built it on Solaris without problems. He also points me to GNU Recode (http://www.gnu.org/directory/recode.html) and Saka's Chinese encoding AutoConvert (http://banyan.dlut.edu.cn/~ygh). I haven't tried either of these.

Since this work is a fundamental weak point of computers, you may consider posting part of the file to a language-specific newsgroup (or even some under soc.culture hierarchy). I have a table that helps people identify languages:
http://www.stormloader.com/yonghuang/misc/language.html. But that doesn't directly tell you what character set you should use.

Yong Huang Received on Fri Mar 07 2003 - 23:51:11 CST