Oracle-L: UTF-16

From: MacGregor, Ian A. <ian_at_SLAC.Stanford.EDU>
Date: Wed, 21 Aug 2002 09:13:43 -0800
Message-ID: <F001.004BB54C.20020821091343@fatcity.com>

So I set NLS_LANG to AL16UTF16? I'd like the note to be more explicit. I thought that would be the value to which I would set the "character set" argument of the create database command. The two are not the same. For instance the "character set" argument of the "create database" might be WE8ISO8859P1; whereas the NLS_LANG variable might be American_America.WE8ISO8859P1. So I'm guessing, American_America.AL16UTF16 for the NLS_LANG variable?

As I understand UTF16 it takes 16 bits to encode each character. This should result in more disk storage being required to store the same amount of information. I also understand the varchar2 fields are character-based; there is no need to increase the length of a varchar2 field which switching from 7/8 bit encoding to 16 bit; however, char fields are byte-based and would need to have their lengths altered. One would need a char(60) field to hold 30 characters encoded with UTF16.

I'm not sure what is mean by

Since 9i and upwards, we support UTF-16 encoding at column level as national
(alternative) database character set. In 9i, the UTF-16 encoding Oracle character set AL16UTF16 has even become the default character set for SQL NCHAR
data types.

How can I use that to my advantage?

My problem stems mainly from Intermedia. The inso_filtering converts the document to the base character set for the database. This is not too bad when Cerenkov loses the diacritical mark over the 'C' and is thus rendered as I have written it here. It's not good at all when a ligature such as the 'fl' in reflection is lost so that the filter converts in to re ection.

There is obviously a cost of going to a 16 bit encoding system. Twice as many bits need to be read to get the same information out. The information which contains the special characters is stored in a BLOB. I wouldn't think that character sets mattered to BLOBs at all. However, character set certainly does matter to the DR$<INDEXNAME>$I tokens of an "Intermedia Index".

I would certainly count myself as a member of the ignorant masses when it comes to character sets. If anything I have stated is untrue, or untrue under certain conditions, I'd sure like to know.

Ian MacGregor

-----Original Message-----
Sent: Tuesday, August 20, 2002 2:35 PM
To: LazyDBA.com Discussion

>From Note 77443.1 on metalink:

UTF-16 SUPPORT

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: MacGregor, Ian A.
  INET: ian_at_SLAC.Stanford.EDU

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Wed Aug 21 2002 - 12:13:43 CDT