Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Storage requirements of NCHAR columns
Anurag Varma wrote:
> Kenneth wrote:
> > On 10 Jul 2006 09:59:49 -0700, "Ross" <rossfreemantle_at_yahoo.co.uk>
> > wrote:
> >
> > >
> > >"When you use NCHAR and NVARCHAR2 datatypes for storing multilingual
> > >data, the
> > >column size specified for a column is defined in number of characters.
> > >(The number of
> > >characters means the number of Unicode code units.)"
> > >
> > >This sould seem to suggest that the a NCHAR(30) column would actually
> > >require 30 bytes of storage (as UTF8 has a single-byte code unit).
> > >However, an example in Chapter 7 explicitly states that 90 bytes are
> > >required. I don't think the NLS_LENGTH_SEMANTICS parameter affects
> > >NCHAR, so which is correct?
> > >
> >
> > UTF8 is not a single-byte charset. It is a varying-width charset.
> >
> > If you defines a NCHAR(30) column, it will contain 30 characters
> > (unless NULL). Always. Period.
> >
> > How many *bytes* of storage does this column require ?
> >
> > That depends on which kind of characters you store in it. If you only
> > store ASCII-characters, each of them will require one byte, because
> > UTF8 is a superset of ASCII, ASCII being in the range 0-255.
> >
> > If you store characters beyond ASCII, between 2 and 4 bytes of storage
> > is needed, depending on what position these characters have in the
> > UNICODE range, which is 0 - (2^32-1).
> >
> > So a minimum of 30 bytes is required. And a maximum of 30*4 = 120
> > bytes, that if all your 30 characters are in the upper 2^8 end of the
> > UNICODE range.
> >
> > Conclusion : Between 30 and 120 bytes are required for a NCHAR(30)
> > column with UTF8 as NCHARSET.
> >
> > - Kenneth Koenraadt
>
>
>
>
This would seem to tally with what is suggested in the docs. If it's correct, then an NCHAR(30) column could only contain 30 characters if they were all single-byte. This, however, contradicts the earlier posts that suggest it could contain 30 characters regardless of their individual encoding. Received on Tue Jul 11 2006 - 04:48:51 CDT