Re: Difference between VARCHAR2(xx CHAR) and NVARCHAR2(xx)

From: ddf <oratune_at_msn.com>
Date: Wed, 22 Dec 2010 05:18:40 -0800 (PST)
Message-ID: <89faa4a4-6ffa-49d0-9087-7f3a4c8f9c21_at_35g2000prb.googlegroups.com>



On Dec 22, 6:54 am, "Álvaro G. Vicario"
<alvaro.NOSPAMTH..._at_demogracia.com.invalid> wrote:
> I've just installed Oracle XE (Oracle Database 10g Express Edition
> Universal). I've changed nothing so it's the using the character sets
> specified in the default settings:
>
>      SELECT * FROM NLS_DATABASE_PARAMETERS;
>
>      NLS_CHARACTERSET       AL32UTF8
>      NLS_NCHAR_CHARACTERSET AL16UTF16
>
> Given that both CHAR and NCHAR data types seem to accept multi-byte
> strings (UTF-8 and UTF-16), what is the exact difference between these
> two column definitions?
>
>      VARCHAR2(10 CHAR)
>      NVARCHAR2(10)
>
> I've found similar questions in many forums but they always explain CHAR
> vs VARCHAR or mention that NVARCHAR2 accepts multi-byte strings.
>
> My educated guess is VARCHAR2 is a legacy type that did not accept
> multi-byte, NVARCHAR2 was added, then VARCHAR2 was enhanced to support
> multi-byte... And also VARCHAR2 depends on the server version to be able
> to store multi-byte while NVARCHAR2 is always available.
>
> Am I right?
>
> --
> --http://alvaro.es- lvaro G. Vicario - Burgos, Spain
> -- Mi sitio sobre programaci n web:http://borrame.com
> -- Mi web de humor satinado:http://www.demogracia.com
> --

I can't say whether you're 'right' or not, but I do know this:

UTF-16: Each character is either 2 or 4 bytes long.

UTF-8: Each character takes 1 to 4 bytes to store.

Thus any character in a UTF-16 implementation (your NVARCHAR2 type) will occupy no less than 2 bytes of storage, where your VARCHAR2 characters (AL32UTF8) can be stored in as little as 1 byte which may not be important to you now but could become so in the future. Both can support multiple languages without character loss however the database character set is also used to identify SQL and PL/SQL source code. In order to do this, it must have either EBCDIC or 7-bit ASCII as a subset, whichever is native to the platform; it is not possible to use a fixed-width, multibyte character set as the database character set such as AL16UTF16. Thus your AL32UTF8 character set allows for 7-bit ASCII characters and doesn't interfere with object names, user names, etc. Also the N-types (NVARCHAR2, NCHAR, NCLOB) will accept and stored multi-byte characters regardless of the database character set in use making it possible to support singlebyte  and multi-byte languages.

David Fitzjarrell Received on Wed Dec 22 2010 - 07:18:40 CST

Original text of this message