Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Strings in UTF8 Database

Re: Strings in UTF8 Database

From: <fitzjarrell_at_cox.net>
Date: 24 May 2007 10:41:18 -0700
Message-ID: <1180028478.787380.151510@k79g2000hse.googlegroups.com>


On May 24, 5:54 am, Number.One.M..._at_gmail.com wrote:
> Hello,
>
> I wonder if anyone could explain why I get the following behaviour.
> This is the same query run on two databases where as far as we know
> the only difference is that the first database is a UTF8 database and
> the second an old school ASCII7.
>
> It's not terribly important, beyond that I think I should understand
> why the answer is different on a UTF8 database.
>
> Thanks in advance,
>
> Mike.
>
> SQL> select * from v$version
> 2 /
>
> BANNER
> ----------------------------------------------------------------
> Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
> PL/SQL Release 10.2.0.2.0 - Production
> CORE 10.2.0.2.0 Production
> TNS for Linux: Version 10.2.0.2.0 - Production
> NLSRTL Version 10.2.0.2.0 - Production
>
> SQL>
> SQL> -- How long is a 40 charecter string?
> SQL> select length(
> 2 chr(64)||chr(59)||chr(36)||chr(59)||chr(129)||chr(96)||
> chr(158)||chr(225)||chr(116)||chr(59)||
> 3 chr(115)||chr(171)||chr(215)||chr(29)||chr(174)||chr(76)||
> chr(22)||chr(94)||chr(28)||chr(35)||
> 4 chr(132)||chr(52)||chr(89)||chr(171)||chr(191)||chr(93)||
> chr(50)||chr(150)||chr(199)||chr(52)||
> 5 chr(4)||chr(127)||chr(144)||chr(21)||chr(215)||chr(238)||
> chr(64)||chr(59)||chr(36)||chr(59)
> 6 ) the_length
> 7 from dual
> 8 /
>
> THE_LENGTH
> ----------
> 26
>
> SQL> conn mijones_at_oradev10
> Enter password: *******
> ERROR:
> ORA-28002: the password will expire within 1 days
>
> Connected.
> SQL> select * from v$version
> 2 /
>
> BANNER
> ----------------------------------------------------------------
> Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
> PL/SQL Release 10.2.0.2.0 - Production
> CORE 10.2.0.2.0 Production
> TNS for Linux: Version 10.2.0.2.0 - Production
> NLSRTL Version 10.2.0.2.0 - Production
>
> SQL>
> SQL> -- How long is a 40 charecter string?
> SQL> select length(
> 2 chr(64)||chr(59)||chr(36)||chr(59)||chr(129)||chr(96)||
> chr(158)||chr(225)||chr(116)||chr(59)||
> 3 chr(115)||chr(171)||chr(215)||chr(29)||chr(174)||chr(76)||
> chr(22)||chr(94)||chr(28)||chr(35)||
> 4 chr(132)||chr(52)||chr(89)||chr(171)||chr(191)||chr(93)||
> chr(50)||chr(150)||chr(199)||chr(52)||
> 5 chr(4)||chr(127)||chr(144)||chr(21)||chr(215)||chr(238)||
> chr(64)||chr(59)||chr(36)||chr(59)
> 6 ) the_length
> 7 from dual
> 8 /
>
> THE_LENGTH
> ----------
> 40
>
> SQL>

>From the Oracle documentation:

UTF-8
A variable-width encoding of UCS2 which uses sequences of 1, 2, or 3 bytes per character. Characters from 0-127 (the 7-bit ASCII characters) are encoded with one byte, characters from 128-2047 require two bytes, and characters from 2048-65535 require three bytes. The Oracle character set name for this is UTF8 (for the Unicode 2.1 standard). The standard has left room for expansion to support the UCS4 characters with sequences of 4, 5, and 6 bytes per character.

The length() function returns the number of characters in the string, however you've mixed single-byte and double-byte characters. Since you have 12 double-byte characters in this string this leaves 28 single-byte characters, most likely considered double-byte characters due to the mix, making the total length, in characters, in a UTF8 character set 26. I expect if you restrict your entire string to single-byte characters (ASCII codes 0-127) you'll find the results the same between the two.

David Fitzjarrell Received on Thu May 24 2007 - 12:41:18 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US