Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Character set changes

Re: Character set changes

From: Tom Tyson <tomtysonjr_at_yahoo.com>
Date: Tue, 29 Aug 2000 11:54:21 -0700 (PDT)
Message-Id: <10603.115767@fatcity.com>


Dave

The UTF8 character set is a variable width character set, so it all depends on the data as to how much space will be used in your VARCHAR2. If you have US7ASCII data that is being convered to UTF8, it will take a single byte just as it did in the US7ASCII character set. If however later you put data into the data field that is not from a single-byte character set, then it will of course take up more space. Here is a snippet from Oracle 8i's National Language Support Guide, which may give some info for you on the size in bytes that will be utilized.

Tom Tyson
Exodus Communications, Inc.



Oracle’s UTF8 character set currently supports the following characters.

Unicode 2.1 (UCS2 and UTF16) characters U+0000 through U+007F inclusive. These are 1-byte characters in UTF8, that have character codes 0x00 through 0x7f inclusive. These can represent only English ASCII characters. All English ASCII characters have exactly the same character codes (0x00 through 0x7f inclusive) in US7ASCII and UTF8 character sets.

Unicode 2.1 (UCS2 and UTF16) characters U+0080 through U+07FF inclusive These are 2-byte characters in UTF8, that have character codes 0xc0WW through 0xdfWW inclusive where WW can be 0x80 through 0xbf inclusive. These can represent characters of most European (including Greek and Russian), Arabic, Hebrew and some other languages.

Unicode 2.1 (UCS2 and UTF16) characters U+0800 through U+D7FF inclusive and U+E000 through U+FFFF inclusive

These are 3-byte characters in UTF8, that have character codes

0xe0WWTT through 0xecWWTT inclusive
0xed80TT through 0xed9fTT inclusive
0xeeWWTT through 0xefWWTT inclusive

where WW and TT are 0x80 through 0xbf inclusive.

These can represent characters of Chinese, Japanese, Korean, Thai, Indic, Dravidian and some other languages. Also, the "euro" currency sign is included in this group of characters. Oracle’s UTF8 character set currently does not support the following characters. If you use these characters in Oracle’s current UTF8 character set, the result is not guaranteed, and the behavior changes in the future releases of Oracle.



Do You Yahoo!? Received on Tue Aug 29 2000 - 13:54:21 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US