Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Storage requirements of NCHAR columns

Re: Storage requirements of NCHAR columns

From: Anurag Varma <avoracle_at_gmail.com>
Date: 10 Jul 2006 12:48:39 -0700
Message-ID: <1152560919.509434.98720@m79g2000cwm.googlegroups.com>

Kenneth wrote:
> On 10 Jul 2006 09:59:49 -0700, "Ross" <rossfreemantle_at_yahoo.co.uk>
> wrote:
>
> >
> >"When you use NCHAR and NVARCHAR2 datatypes for storing multilingual
> >data, the
> >column size specified for a column is defined in number of characters.
> >(The number of
> >characters means the number of Unicode code units.)"
> >
> >This sould seem to suggest that the a NCHAR(30) column would actually
> >require 30 bytes of storage (as UTF8 has a single-byte code unit).
> >However, an example in Chapter 7 explicitly states that 90 bytes are
> >required. I don't think the NLS_LENGTH_SEMANTICS parameter affects
> >NCHAR, so which is correct?
> >
>
> UTF8 is not a single-byte charset. It is a varying-width charset.
>
> If you defines a NCHAR(30) column, it will contain 30 characters
> (unless NULL). Always. Period.
>
> How many *bytes* of storage does this column require ?
>
> That depends on which kind of characters you store in it. If you only
> store ASCII-characters, each of them will require one byte, because
> UTF8 is a superset of ASCII, ASCII being in the range 0-255.
>
> If you store characters beyond ASCII, between 2 and 4 bytes of storage
> is needed, depending on what position these characters have in the
> UNICODE range, which is 0 - (2^32-1).
>
> So a minimum of 30 bytes is required. And a maximum of 30*4 = 120
> bytes, that if all your 30 characters are in the upper 2^8 end of the
> UNICODE range.
>
> Conclusion : Between 30 and 120 bytes are required for a NCHAR(30)
> column with UTF8 as NCHARSET.
>
> - Kenneth Koenraadt

UTF8 would require between 1 to 3 bytes. surrogate pairs would use 6 bytes in UTF8.
AL32UTF8 which is the "real" UTF8 would use between 1 to 4 bytes.

When a nchar column of length N is defined (in al16UTF16). The maximum bytes you can
enter in that column is 2 * N.
If NCHAR is of UTF8 type, then maximum bytes you can enter in that column
is 3 * N.

The hard limit of 4000 bytes still applies. So a 4000 CHAR column does not necessarily
mean it will hold 4000 characters (in utf8 or al16utf16).

Metalink Note: 144808.1 provides some info on this ...

Anurag Received on Mon Jul 10 2006 - 14:48:39 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US