Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: what characterset to use?

Re: what characterset to use?

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: 30 Aug 2007 11:05:20 GMT
Message-ID: <1188471915.119548@proxy.dienste.wien.at>


Martin T. <0xCDCDCDCD_at_gmx.at> wrote:
> If I run the Oracle database on Windows, should it be Unicode or
> Windows-1252?

Oracle recommends UNICODE:
http://www.oracle.com/technology/pub/columns/trute_unicode.html and Metalink note 333489.1 say so.

I agree with Oracle.
It is always safe to choose AL32UTF8 as database character set.

But this is just a rule of thumb, your particular requirements may recommend a different choice.

Particularly if it is not a new installation, it may be wise not to try and change the character set to UNICODE, but to leave it as it is.

> An installation on Linux ... should it use UTF8, even when run for a
> e.g. Japanese site where this will result in overhead vs. 16bit Unicode?

For Kanji text, UTF-8 is certainly not the perfect choice, if storage size is important for you.

UTF-16 would be better - but you cannot have UTF-16 as database character set in Oracle. You'd have to use it as "national character set" and define all the text columns in your database with NVARCHAR2 instead of VARCHAR2 and NCHAR instead of CHAR.

There are probably some other "if"s one can think of.

I would probably use AL32UTF8 as database character set and AL16UTF16 as national character set and try to remember to define all Japanese text columns as NVARCHAR.

That way I can have the benefits of UTF-16 (less storage), but I am also safe if somebody tries to insert Japanese text into a VARCHAR column.

> If I have client applications from 3 different O/S's ... which character
> set?

The safe answer is UNICODE. It is always the safe answer, that is why Oracle recommends it.

If - say - you only have clients that generate and consume WINDOWS-1252, ISO8859-1 and ISO8859-15 and you can be certain of that, you could of course also use WE8MSWIN1252 as database character set (see Metalink 264294.1).

There is no big benefit in that, however - the ony ones I can think of are that
a) some special characters will only need 1 byte instead of 2 or 3 b) no character set conversion overhead for WINDOWS-1252 clients.

> What IS the "character set of the OS" anyway?

Good question, I don't think there is an easy answer.

As far as I know, the character set of the Oracle software owner operating system user has no influence on which database characters set you should choose (I may be wrong on this though).

Yours,
Laurenz Albe Received on Thu Aug 30 2007 - 06:05:20 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US