Home » SQL & PL/SQL » SQL & PL/SQL » RPAD problem
RPAD problem [message #161223] Thu, 02 March 2006 13:41 Go to next message
henckel
Messages: 9
Registered: March 2006
Junior Member
Hi everyone!

I hope that someone can help me with the following RPAD problem. Without talking a lot the following statement will show you what my problem is.

statement 1
SELECT LENGTH('x'),
       LENGTH('„'),
       LENGTH(RPAD('xxx', 5)),
       LENGTH(RPAD('xx„', 5))
FROM   DUAL;

output 1
[ 1 | 1 | 5 | 4 ]


statement 2
SELECT RPAD('xxx', 5, 'a'),
       RPAD('xx„', 5, 'a')
FROM   DUAL;

output 2
[ xxxaa | xx„a ]

This quotation mark („) is a non-ASCII character and seems to be the reason why RPAD does not work as intended.
What is the exact reason for this problem and how can it be solved?

Thanks a lot in advance!

Best regards
Sven
Re: RPAD problem [message #161256 is a reply to message #161223] Thu, 02 March 2006 19:54 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
This from the manual on RPAD

Quote:

The argument n is the total length of the return value as it is displayed on your terminal screen. In most character sets, this is also the number of characters in the return value. However, in some multibyte character sets, the display length of a character string can differ from the number of characters in the string.


Add to this the fact that LENGTH() returns the number of chars in the string (not the number of bytes, not the display length), and it looks as though Oracle thinks that your character has a display-length of 2.

Does it take 2 chars to display, or just 1? It looks like 1 from your example. If so, you may have a case to open a TAR with Oracle. They seem to have your character (incorrectly?) recorded as having a display-length of 2 rather than 1.

I did a quick test on another multi-byte character that DOES display as 1 char.

  1  select chrx
  2  , length(chrx) as len
  3  , lengthb(chrx) as blen
  4  , rpad('a' || chrx, 5, 'a') as chr5
  5  , length(rpad('a' || chrx, 5, 'a')) as len5
  6  from (
  7     select unistr('\\AAAA') as chrx
  8     from dual
  9* )
CPROD@CDMPROD> /

C        LEN       BLEN CHR5        LEN5
- ---------- ---------- ----- ----------
?          1          2 a?aaa          5

1 row selected.


This shows that LENGTH and RPAD work consistently for that character. It has a display-length of 1, so RPAD pads it to a display length of 5.

_____________
Ross Leishman
Re: RPAD problem [message #161344 is a reply to message #161256] Fri, 03 March 2006 06:34 Go to previous messageGo to next message
henckel
Messages: 9
Registered: March 2006
Junior Member
Hi Ross,

thanks for your quick reply.

I must admit that I don't really understand what you mean with display-length. In my text editor as well as this forum's textarea the quatation mark character („) is displayed at the same width as a standard ASCII character (like "a") is. I don't know in which case a character is displayed at another width. Can you give me an example?

I've tested my statement with other multi-byte characters and they lead to the same problem. I have never had any problems with one- or two-byte characters, but only with three-byte characters (like the quotation mark character).

Maybe that will help to solve the problem...?

Best regards
Sven
Re: RPAD problem [message #623393 is a reply to message #161344] Tue, 09 September 2014 15:04 Go to previous messageGo to next message
gargsourabh786
Messages: 2
Registered: September 2014
Location: Pittsburgh
Junior Member
Oracle built in RPAD function has issues while working with multi byte data.
RPAD does not pad in characters. It pads in how wide Oracle thinks the characters are on-screen. It has its own length calculation algorithm different to all the LENGTH, LENGTHB, LENGTHC, LENGTH4 etc.

低 is a character which Oracle thinks is "width 2" (see below screenshot that demonstrates the same.)



It is the case where RPAD treats the length of certain, single characters as 2, not 1

for example,

SELECT rpad('上海昕驰电气控', 7),
length('上海昕驰电气控')
FROM dual;




We expect first column to have 7 chars in, right?
But it does not. It contains '上海昕 ' i.e only first three characters and a space.

RPAD didn't truncate to a length in characters. It is based on this magic "display width"
the way it works is it thinks "OK the Chinese characters are 'wide' - they take up 2 normal characters on-screen"
so for RPAD, it counts each Chinese char as width 2, not 1.
so we get the first 3 (which RPAD thinks is total of length 6)
and then it adds a space to make it 7.


Oracle doc for RPAD says:
The total length of the return value as it is displayed on your terminal screen. In most character sets, this is also the number of characters in the return value. However, in some multi-byte character sets, the display length of a character string can differ from the number of characters in the string.

RPAD was designed mainly for rendering text on to displays not writing fixed length output files which are "x characters long per line".

In our company, one of my collegue wrote an excellent custom function for pad:

FUNCTION pad_text(p_text IN VARCHAR2,
p_length IN PLS_INTEGER,
p_padstring IN VARCHAR2 := ' ',
p_side IN VARCHAR2 := 'right')
-- pads text to specified length in characters [not bytes!] with specified string. Overlong text is truncated
-- p_text [R] - input text to fix length to
-- p_length [R] - length to fix string to, in characters
-- p_padstring [O] - string to pad with. Default is space. PLEASE NOTE THAT IF THIS PARAMETER CONTAINS WIDE CHARACTERS,
-- WITH A SCREEN DISPLAY WIDTH GREATER 1, IT WILL RETURN UNEXPECTED RESULTS!
-- p_side [O] - pad left or right side of p_text. Default is right.
RETURN VARCHAR2 DETERMINISTIC IS
l_output VARCHAR2(32767) := NVL(p_text, ' ');
l_padlength PLS_INTEGER := p_length-LENGTH(l_output);
BEGIN
-- This function used to just RETURN RPAD(SUBSTR(NVL(p_text, ' '), 1, p_length), p_length, p_padstring);
-- However RPAD() does not pad to "length in characters"
-- RPAD() has its own string length algorithm based on "display width" - different to all 5 LENGTH() functions [length, lengthb, lengthc, length2, length4]

-- eg LENGTH(RPAD(UNISTR('\662F'), 2)) is 1, NOT 2
-- This is because the display width of UNISTR('\662F') after font rendering is 2 characters wide, not 1,
-- so RPAD() treats it as "width 2" already & doesn't add an extra space
-- \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
-- This may look fine for fixed-width text on-screen, \\ IT IS NOT FINE FOR FIXED WIDTH FILES! \\
-- \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

-- Anyway, we roll our own by calculating the length of characters to pad with LENGTH() & then appending it
-- This is seriously faster than
-- WHILE LENGTH(l_output) < p_length LOOP
-- l_output := l_output || p_padstring;
-- END LOOP;
-- HOWEVER! It will not work correctly if p_padstring contains "wide characters" such as UNISTR('\662F') and perhaps all kanji characters [and probably more!]
IF l_padlength > 0 THEN
CASE LOWER(p_side)
WHEN 'right' THEN
l_output := l_output || RPAD(p_padstring, l_padlength, p_padstring);
WHEN 'left' THEN
l_output := RPAD(p_padstring, l_padlength, p_padstring) || l_output;
ELSE
-- usually I wouldn't "die on the spot" but this error should be corrected at or around "first dev build" -
-- absolutely nowhere near as late as UAT or PROD so I figure it's easiest to die with a helpful message
RAISE_APPLICATION_ERROR(c_err_invalid_pad_side, 'Invalid value for p_side in pad_text(), use "left" or "right"', TRUE);
END CASE;
END IF;
-- now it is equal or over-length, trim any over-length characters [could happen if input string was longer than pad length]
-- NB for all those thinking SUBSTR is somehow "bad" - its use is essential here
-- we need to count length in characters, regardless of length in bytes in DB encoding [which at time of writing is UTF-8]
-- conversion to destination encoding happens in put() - that is the time when multibyte characters are converted to the correct byte-sequence
-- [or substituted if character cannot be represented]
RETURN SUBSTR(l_output, 1, p_length);
END;

so you can use this custom pad function instead of oracle in built function.

Hope it will help.

Thanks,
Sourabh Garg
412 932 3603

Re: RPAD problem [message #623394 is a reply to message #623393] Tue, 09 September 2014 15:06 Go to previous messageGo to next message
gargsourabh786
Messages: 2
Registered: September 2014
Location: Pittsburgh
Junior Member
i want to add one more thing:

I saw that when we run the above Sql's , the output is like ???????




It happened because when Oracle DB client was installed on our machine it never asks us which character set to use. It just bases it on our Windows settings
it will be something like WE8ISO8859P1 (not UTF8 anyway!)

It means that all data we send to Oracle & received from Oracle is being converted by the actual database client installed on our machine.

Everybody must have noticed that we receive the below warning message when we login to pl/sql developer.






This warning is activated by below option in pl/sql dev:
Tools > Preferences > Options
in there is an option "Check for client & server character set mismatch". We need to enable it.

That warning has the option "never show this again!" and that is why a lot of people don't see it..
But it is very important .

The correct way to remove the warning is to change our client to connect in DB character set
then there is no automatic translation of data will happen.

To do this, we need to edit registry by navigating to HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE Or
(if we are using 64bit Windows7) HKEY_LOACL_MACHINE\SOFTWARE\Wow6432Node\ORACLE

Edit the value for NLS_LANG (in the right).

NLS_LANG specifies 3 things: language_territory.charset
First 2 are OK but the last bit should be changed to UTF8 so it becomes AMERICAN_AMERICA.UTF8 (just double click the NLS_LANG and try to change the end).
we need to set it in several places. search for NLS_LANG by pressing F3[this means search next].

After we saved this, We will be connecting with UTF8 which the database has and no charset conversions will be done by our connection


I think, this setting should be done by all devlopers because absence of this setting allows pl/sql devloper to hide the truth from us.
Re: RPAD problem [message #623395 is a reply to message #623394] Tue, 09 September 2014 15:10 Go to previous message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Thanks for sharing.

Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/ and read http://www.orafaq.com/forum/t/174502/


Do you realize that you resurrected an 8 YEAR old zombie thread?
Previous Topic: How to fetch last five year data
Next Topic: Parallel hint issue
Goto Forum:
  


Current Time: Thu Apr 25 14:20:40 CDT 2024