Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: Storage requirements of NCHAR columns
Anurag Varma schreef:
> Ross wrote:
>>>>> UTF8 would require between 1 to 3 bytes. surrogate pairs would use 6 >>>>> bytes in UTF8. >>>>> AL32UTF8 which is the "real" UTF8 would use between 1 to 4 bytes. >>>>> >>>>> When a nchar column of length N is defined (in al16UTF16). The maximum >>>>> bytes you can >>>>> enter in that column is 2 * N. >>>>> If NCHAR is of UTF8 type, then maximum bytes you can enter in that >>>>> column >>>>> is 3 * N. >>>>> >>>>> The hard limit of 4000 bytes still applies. So a 4000 CHAR column does >>>>> not necessarily >>>>> mean it will hold 4000 characters (in utf8 or al16utf16). >>>>> >>>>> Metalink Note: 144808.1 provides some info on this ... >>>>> >>>>> Anurag >>>> This would seem to tally with what is suggested in the docs. If it's >>>> correct, then an NCHAR(30) column could only contain 30 characters if >>>> they were all single-byte. This, however, contradicts the earlier posts >>>> that suggest it could contain 30 characters regardless of their >>>> individual encoding. >>> ? Are you interpreting it differently? nchar(30) WOULD contain 30 >>> characters >>> even if they were multibyte. The actual byte length is adjusted to >>> ensure that. >>> >>> Anurag >> Sorry, I misinterpreted your post. If what you say is correct, what do >> you make of the suggestion in the Oracle docs that NCHAR columns are >> defined in terms of code units? Is it incorrect, or have I missed >> something?
UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters that are defined in Unicode 3.2 by using two UCS-2 code points for each supplementary character.
So you are almost always correct; UTF16 is fixed 2byte for the standard characters, but 4byte (2 code points) for the supplementary characters. UTF16 allows for 1,048,576 characters to be defined. The Unicode 3.2 standard defines 45,960 supplementary characters.
To clarify, if still needed:
Table 6-5 of the Globalization Manual (excerpt):
Although the maximum lengths of the NCHAR and NVARCHAR2 columns are
larger in UTF8, the actual storage size is still bound by the byte
limits of 2000 and 4000 bytes, respectively. For example, you can store
4000 UTF8 characters in an NVARCHAR2 column if all the characters are
single byte, but *only 4000/3 characters* if *all the characters are
three bytes*.
Emphasis is mine.
Table 6-6 of the Globalization manual:
Maximum Datatype Size
National Character Set Maximum Column Size Maximum Column Size of NCHAR Datatype of NVARCHAR2 Datatype AL16UTF16 1000 characters 2000 characters UTF8 2000 bytes 4000 bytes
(Hope this still formats correctly somewhat)
And, most importantly:
UTF16 is *not* available as database character set,
just as national character set.
-- Regards, Frank van Bortel Top-posting is one way to shut me up...Received on Wed Jul 12 2006 - 13:00:30 CDT
![]() |
![]() |