Re: Oracle 8.i --> Oracle 9i + Unicode

From: Howard J. Rogers <howardjr2000_at_yahoo.com.au>
Date: Tue, 23 Sep 2003 21:29:05 +1000
Message-ID: <3f702f26$0$7066$afc38c87@news.optusnet.com.au>

Tanel Poder wrote:

>> > Everything out of ASCII is at least double byte in UTF-8 - most of the
>> > additional latin letters, cyrillics, and many others take up two bytes

> in

>> > UTF-8.
>>
>> Totally correct.
>>
>> O-umlaut and a-umlaut are double-byte in UTF-8, not triple-byte.

>
> Well, despite of any standards out there, I used vsize command in Oracle
> to show, how many bytes a char really takes, and it took 3 bytes in above
> mentioned examples.
>
> Tanel.

For what it's worth, here's the Oracle 9i New Features course documentation:

"utf-8 encoding is the 8-bit encoding of Unicode. It is a variable-width encoding and also a strict superset of ASCII. ... One Unicode character can be one, two, three or four bytes in this encoding. Characters from the European scripts are represented in either one or two bytes; characters from most Asian scripts are represented in three bytes, while supplementary characters are represented in four bytes."

I'd question what vsize is actually measuring.

Regards
HJR Received on Tue Sep 23 2003 - 06:29:05 CDT