Oracle Bug? RPAD of Japanese (kanji) character in Oracle 10gR2 UTF8 database

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: Fri, 16 Jan 2009 10:23:41 -0800 (PST)
Message-ID: <e626ed27-81de-44a9-8d4a-f3e8d017c093_at_z27g2000prd.googlegroups.com>


[Quoted] I've searched metalink and not seen a mention, but want to make sure I'm not missing anything obvious in calling this a bug. CHR(15121570) is the UTF8 character point representing kanji character '漢'. It seems like RPAD is not properly padding it out to a full 4 characters in the example below:

SELECT RPAD(CHR(15121570),4,'*') FROM DUAL; RESULT:
漢**

A LENGTH() function reveals that the RPAD is only creating a string with 3 characters:

SELECT LENGTH(RPAD(CHR(15121570),4,'*')) FROM DUAL; RESULT:
3

The same logic against a different multi-byte UTF8 character, the Microsoft ellipse, shows the expected behavior:

SELECT RPAD(CHR(14844070),4,'*') FROM DUAL; RESULT:
…***

SELECT LENGTH(RPAD(CHR(14844070),4,'*')) FROM DUAL; RESULT:
4

(note - i'm using '*' to make the RPAD functionality more visible; the
same behavior occurs with the default blank-space, eg RPAD(CHR
(15121570),4))

Verifying the UTF8 encoding:
select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
RESULT:
NLS_CHARACTERSET UTF8 So, the question - does anyone see any obvious oversight on my part, or should I consider this a "probable bug" in need of a TAR? Received on Fri Jan 16 2009 - 19:23:41 CET

Original text of this message