Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Storage requirements of NCHAR columns

Re: Storage requirements of NCHAR columns

From: Frank van Bortel <frank.van.bortel_at_gmail.com>
Date: Wed, 12 Jul 2006 20:00:30 +0200
Message-ID: <e93d57$ks0$1@news6.zwoll1.ov.home.nl>


Anurag Varma schreef:
> Ross wrote:

>>>>> UTF8 would require between 1 to 3 bytes. surrogate pairs would use 6
>>>>> bytes in UTF8.
>>>>> AL32UTF8 which is the "real" UTF8 would use between 1 to 4 bytes.
>>>>>
>>>>> When a nchar column of length N is defined (in al16UTF16). The maximum
>>>>> bytes you can
>>>>> enter in that column is 2 * N.
>>>>> If NCHAR is of UTF8 type, then maximum bytes you can enter in that
>>>>> column
>>>>> is 3 * N.
>>>>>
>>>>> The hard limit of 4000 bytes still applies. So a 4000 CHAR column does
>>>>> not necessarily
>>>>> mean it will hold 4000 characters (in utf8 or al16utf16).
>>>>>
>>>>> Metalink Note: 144808.1 provides some info on this ...
>>>>>
>>>>> Anurag
>>>> This would seem to tally with what is suggested in the docs. If it's
>>>> correct, then an NCHAR(30) column could only contain 30 characters if
>>>> they were all single-byte. This, however, contradicts the earlier posts
>>>> that suggest it could contain 30 characters regardless of their
>>>> individual encoding.
>>> ? Are you interpreting it differently? nchar(30) WOULD contain 30
>>> characters
>>> even if they were multibyte. The actual byte length is adjusted to
>>> ensure that.
>>>
>>> Anurag
>> Sorry, I misinterpreted your post. If what you say is correct, what do
>> you make of the suggestion in the Oracle docs that NCHAR columns are
>> defined in terms of code units? Is it incorrect, or have I missed
>> something?

>
> I don't know what they mean by that.
> http://www.oracle.com/technology/oramag/oracle/03-nov/o63tech_glob.html
> states:
>
> <quote>
> To make it easy to allocate proper storage for Unicode values, Oracle9i
> Database introduced character semantics. You can now use a declaration
> such as VARCHAR2(3 CHAR), and Oracle will set aside the correct number
> of bytes to accommodate three characters in the underlying character
> set. In the case of AL32UTF8, Oracle will allocate 12 bytes, because
> the maximum length of a UTF-8 character encoding is four bytes (3
> characters * 4 bytes/character = 12 bytes). On the other hand, if
> you're using AL16UTF16, in which case your declaration would be
> NVARCHAR2(3), Oracle allocates just six bytes (3 characters * 2
> bytes/character = 6 bytes). One difference worth noting is that for
> UTF-8, a declaration using character semantics allows enough room for
> surrogate characters, whereas for UTF-16 that is not the case. A
> declaration such as NVARCHAR2(3) provides room for three UTF-16 code
> units, but a single supplementary character may consume two of those
> code units.
> </quote>
>
> ... seems like they consider 1 code unit = 2bytes for utf16.
>
> Anurag
>

UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters that are defined in Unicode 3.2 by using two UCS-2 code points for each supplementary character.

So you are almost always correct; UTF16 is fixed 2byte for the standard characters, but 4byte (2 code points) for the supplementary characters. UTF16 allows for 1,048,576 characters to be defined. The Unicode 3.2 standard defines 45,960 supplementary characters.

To clarify, if still needed:
Table 6-5 of the Globalization Manual (excerpt): Although the maximum lengths of the NCHAR and NVARCHAR2 columns are larger in UTF8, the actual storage size is still bound by the byte limits of 2000 and 4000 bytes, respectively. For example, you can store 4000 UTF8 characters in an NVARCHAR2 column if all the characters are single byte, but *only 4000/3 characters* if *all the characters are three bytes*.

Emphasis is mine.

Table 6-6 of the Globalization manual:
Maximum Datatype Size

National Character Set	Maximum Column Size	Maximum Column Size
			of NCHAR Datatype 	of NVARCHAR2 Datatype

AL16UTF16		1000 characters		2000 characters

UTF8			2000 bytes		4000 bytes

(Hope this still formats correctly somewhat)

And, most importantly:
UTF16 is *not* available as database character set, just as national character set.

-- 
Regards,
Frank van Bortel

Top-posting is one way to shut me up...
Received on Wed Jul 12 2006 - 13:00:30 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US