Re: Storage requirements of NCHAR columns

From: Anurag Varma <avoracle_at_gmail.com>
Date: 11 Jul 2006 09:12:34 -0700
Message-ID: <1152634354.433977.107200@p79g2000cwp.googlegroups.com>

Ross wrote:
> > > > UTF8 would require between 1 to 3 bytes. surrogate pairs would use 6
> > > > bytes in UTF8.
> > > > AL32UTF8 which is the "real" UTF8 would use between 1 to 4 bytes.
> > > >
> > > > When a nchar column of length N is defined (in al16UTF16). The maximum
> > > > bytes you can
> > > > enter in that column is 2 * N.
> > > > If NCHAR is of UTF8 type, then maximum bytes you can enter in that
> > > > column
> > > > is 3 * N.
> > > >
> > > > The hard limit of 4000 bytes still applies. So a 4000 CHAR column does
> > > > not necessarily
> > > > mean it will hold 4000 characters (in utf8 or al16utf16).
> > > >
> > > > Metalink Note: 144808.1 provides some info on this ...
> > > >
> > > > Anurag
> > >
> > > This would seem to tally with what is suggested in the docs. If it's
> > > correct, then an NCHAR(30) column could only contain 30 characters if
> > > they were all single-byte. This, however, contradicts the earlier posts
> > > that suggest it could contain 30 characters regardless of their
> > > individual encoding.
> >
> > ? Are you interpreting it differently? nchar(30) WOULD contain 30
> > characters
> > even if they were multibyte. The actual byte length is adjusted to
> > ensure that.
> >
> > Anurag
>
> Sorry, I misinterpreted your post. If what you say is correct, what do
> you make of the suggestion in the Oracle docs that NCHAR columns are
> defined in terms of code units? Is it incorrect, or have I missed
> something?

I don't know what they mean by that.
http://www.oracle.com/technology/oramag/oracle/03-nov/o63tech_glob.html states:

<quote>
To make it easy to allocate proper storage for Unicode values, Oracle9i Database introduced character semantics. You can now use a declaration such as VARCHAR2(3 CHAR), and Oracle will set aside the correct number of bytes to accommodate three characters in the underlying character set. In the case of AL32UTF8, Oracle will allocate 12 bytes, because the maximum length of a UTF-8 character encoding is four bytes (3 characters * 4 bytes/character = 12 bytes). On the other hand, if you're using AL16UTF16, in which case your declaration would be NVARCHAR2(3), Oracle allocates just six bytes (3 characters * 2 bytes/character = 6 bytes). One difference worth noting is that for UTF-8, a declaration using character semantics allows enough room for surrogate characters, whereas for UTF-16 that is not the case. A declaration such as NVARCHAR2(3) provides room for three UTF-16 code units, but a single supplementary character may consume two of those code units.
</quote>

... seems like they consider 1 code unit = 2bytes for utf16.

Anurag Received on Tue Jul 11 2006 - 11:12:34 CDT