Home » SQL & PL/SQL » SQL & PL/SQL » RPAD problem
RPAD problem [message #161223] |
Thu, 02 March 2006 13:41 |
henckel
Messages: 9 Registered: March 2006
|
Junior Member |
|
|
Hi everyone!
I hope that someone can help me with the following RPAD problem. Without talking a lot the following statement will show you what my problem is.
statement 1
SELECT LENGTH('x'),
LENGTH('„'),
LENGTH(RPAD('xxx', 5)),
LENGTH(RPAD('xx„', 5))
FROM DUAL;
output 1
[ 1 | 1 | 5 | 4 ]
statement 2
SELECT RPAD('xxx', 5, 'a'),
RPAD('xx„', 5, 'a')
FROM DUAL;
output 2
[ xxxaa | xx„a ]
This quotation mark („) is a non-ASCII character and seems to be the reason why RPAD does not work as intended.
What is the exact reason for this problem and how can it be solved?
Thanks a lot in advance!
Best regards
Sven
|
|
|
Re: RPAD problem [message #161256 is a reply to message #161223] |
Thu, 02 March 2006 19:54 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
This from the manual on RPAD
Quote: | The argument n is the total length of the return value as it is displayed on your terminal screen. In most character sets, this is also the number of characters in the return value. However, in some multibyte character sets, the display length of a character string can differ from the number of characters in the string.
|
Add to this the fact that LENGTH() returns the number of chars in the string (not the number of bytes, not the display length), and it looks as though Oracle thinks that your character has a display-length of 2.
Does it take 2 chars to display, or just 1? It looks like 1 from your example. If so, you may have a case to open a TAR with Oracle. They seem to have your character (incorrectly?) recorded as having a display-length of 2 rather than 1.
I did a quick test on another multi-byte character that DOES display as 1 char.
1 select chrx
2 , length(chrx) as len
3 , lengthb(chrx) as blen
4 , rpad('a' || chrx, 5, 'a') as chr5
5 , length(rpad('a' || chrx, 5, 'a')) as len5
6 from (
7 select unistr('\\AAAA') as chrx
8 from dual
9* )
CPROD@CDMPROD> /
C LEN BLEN CHR5 LEN5
- ---------- ---------- ----- ----------
? 1 2 a?aaa 5
1 row selected.
This shows that LENGTH and RPAD work consistently for that character. It has a display-length of 1, so RPAD pads it to a display length of 5.
_____________
Ross Leishman
|
|
|
Re: RPAD problem [message #161344 is a reply to message #161256] |
Fri, 03 March 2006 06:34 |
henckel
Messages: 9 Registered: March 2006
|
Junior Member |
|
|
Hi Ross,
thanks for your quick reply.
I must admit that I don't really understand what you mean with display-length. In my text editor as well as this forum's textarea the quatation mark character („) is displayed at the same width as a standard ASCII character (like "a") is. I don't know in which case a character is displayed at another width. Can you give me an example?
I've tested my statement with other multi-byte characters and they lead to the same problem. I have never had any problems with one- or two-byte characters, but only with three-byte characters (like the quotation mark character).
Maybe that will help to solve the problem...?
Best regards
Sven
|
|
|
Re: RPAD problem [message #623393 is a reply to message #161344] |
Tue, 09 September 2014 15:04 |
|
gargsourabh786
Messages: 2 Registered: September 2014 Location: Pittsburgh
|
Junior Member |
|
|
Oracle built in RPAD function has issues while working with multi byte data.
RPAD does not pad in characters. It pads in how wide Oracle thinks the characters are on-screen. It has its own length calculation algorithm different to all the LENGTH, LENGTHB, LENGTHC, LENGTH4 etc.
低 is a character which Oracle thinks is "width 2" (see below screenshot that demonstrates the same.)
It is the case where RPAD treats the length of certain, single characters as 2, not 1
for example,
SELECT rpad('上海昕驰电气控', 7),
length('上海昕驰电气控')
FROM dual;
We expect first column to have 7 chars in, right?
But it does not. It contains '上海昕 ' i.e only first three characters and a space.
RPAD didn't truncate to a length in characters. It is based on this magic "display width"
the way it works is it thinks "OK the Chinese characters are 'wide' - they take up 2 normal characters on-screen"
so for RPAD, it counts each Chinese char as width 2, not 1.
so we get the first 3 (which RPAD thinks is total of length 6)
and then it adds a space to make it 7.
Oracle doc for RPAD says:
The total length of the return value as it is displayed on your terminal screen. In most character sets, this is also the number of characters in the return value. However, in some multi-byte character sets, the display length of a character string can differ from the number of characters in the string.
RPAD was designed mainly for rendering text on to displays not writing fixed length output files which are "x characters long per line".
In our company, one of my collegue wrote an excellent custom function for pad:
FUNCTION pad_text(p_text IN VARCHAR2,
p_length IN PLS_INTEGER,
p_padstring IN VARCHAR2 := ' ',
p_side IN VARCHAR2 := 'right')
-- pads text to specified length in characters [not bytes!] with specified string. Overlong text is truncated
-- p_text [R] - input text to fix length to
-- p_length [R] - length to fix string to, in characters
-- p_padstring [O] - string to pad with. Default is space. PLEASE NOTE THAT IF THIS PARAMETER CONTAINS WIDE CHARACTERS,
-- WITH A SCREEN DISPLAY WIDTH GREATER 1, IT WILL RETURN UNEXPECTED RESULTS!
-- p_side [O] - pad left or right side of p_text. Default is right.
RETURN VARCHAR2 DETERMINISTIC IS
l_output VARCHAR2(32767) := NVL(p_text, ' ');
l_padlength PLS_INTEGER := p_length-LENGTH(l_output);
BEGIN
-- This function used to just RETURN RPAD(SUBSTR(NVL(p_text, ' '), 1, p_length), p_length, p_padstring);
-- However RPAD() does not pad to "length in characters"
-- RPAD() has its own string length algorithm based on "display width" - different to all 5 LENGTH() functions [length, lengthb, lengthc, length2, length4]
-- eg LENGTH(RPAD(UNISTR('\662F'), 2)) is 1, NOT 2
-- This is because the display width of UNISTR('\662F') after font rendering is 2 characters wide, not 1,
-- so RPAD() treats it as "width 2" already & doesn't add an extra space
-- \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
-- This may look fine for fixed-width text on-screen, \\ IT IS NOT FINE FOR FIXED WIDTH FILES! \\
-- \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
-- Anyway, we roll our own by calculating the length of characters to pad with LENGTH() & then appending it
-- This is seriously faster than
-- WHILE LENGTH(l_output) < p_length LOOP
-- l_output := l_output || p_padstring;
-- END LOOP;
-- HOWEVER! It will not work correctly if p_padstring contains "wide characters" such as UNISTR('\662F') and perhaps all kanji characters [and probably more!]
IF l_padlength > 0 THEN
CASE LOWER(p_side)
WHEN 'right' THEN
l_output := l_output || RPAD(p_padstring, l_padlength, p_padstring);
WHEN 'left' THEN
l_output := RPAD(p_padstring, l_padlength, p_padstring) || l_output;
ELSE
-- usually I wouldn't "die on the spot" but this error should be corrected at or around "first dev build" -
-- absolutely nowhere near as late as UAT or PROD so I figure it's easiest to die with a helpful message
RAISE_APPLICATION_ERROR(c_err_invalid_pad_side, 'Invalid value for p_side in pad_text(), use "left" or "right"', TRUE);
END CASE;
END IF;
-- now it is equal or over-length, trim any over-length characters [could happen if input string was longer than pad length]
-- NB for all those thinking SUBSTR is somehow "bad" - its use is essential here
-- we need to count length in characters, regardless of length in bytes in DB encoding [which at time of writing is UTF-8]
-- conversion to destination encoding happens in put() - that is the time when multibyte characters are converted to the correct byte-sequence
-- [or substituted if character cannot be represented]
RETURN SUBSTR(l_output, 1, p_length);
END;
so you can use this custom pad function instead of oracle in built function.
Hope it will help.
Thanks,
Sourabh Garg
412 932 3603
|
|
|
|
|
Goto Forum:
Current Time: Thu Apr 25 14:20:40 CDT 2024
|