Home » SQL & PL/SQL » SQL & PL/SQL » Replacing multiple characters
| Replacing multiple characters [message #585076] |
Tue, 21 May 2013 15:35  |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
Hi all,
I am doing some ETL that I need to run "faster". The function in which I am interested removes low ascii code characters from a string. Please see the timing below and the definitions of the of the functions below those. I am selecting just the first 100K rows for testing and timing purposes only. In production, we are doing millions of records several times a day, thus the desire for "faster". Selecting with no functions is very fast, 0.2 seconds. We would really really love to convert at least 100K rows per second. As you can see, I am getting nowhere near there. The best I can do is get it down to around five seconds using clear_nonlegal. That is, ironically, the one that I thought would be the slowest. It's making thirty-one calls to REPLACE. I would have guessed that the other two would be much faster. I am guessing that REPLACE is just much better optimized than TRANSLATE and, of course, my homegrown PL/SQL, which isn't optimized at all.
So, my question is this. Does anybody know if there is a way I can optimize my custom function, or maybe know of a better already optimized standard SQL and/or Oracle function that would do the job? I am not asking for a complete solution. I am just wondering if anybody can point me in the right direction. I am thinking about trying to use a Java stored procedure, but I have never done that before, I am not currently set up for it, and have no idea if it would be any faster anyway. Is Java faster with string manipulation the PL/SQL? I am thinking it would be really fast to call a C method, but I have no idea if that can even be done, and maybe context switching would kill me anyway. As you can see, I am pretty much out of ideas on this. Any suggestions, tips, or help would be greatly appreciated.
Scott
Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0
Connected as aggs@AGGSTEST
SQL> set timing on
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT keyword_dest_url
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 0.203 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 4.82 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal_translate(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 10.905 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal_man(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 8.611 seconds
SQL> SELECT ROUND(AVG(LENGTH(keyword_dest_url))) avg_len
2 FROM se_keywords sek
3 WHERE user_id = 1068
4 AND rownum < 100000;
avg_len
-------
172
CREATE OR REPLACE FUNCTION CLEAR_NONLEGAL(str_in IN VARCHAR2) RETURN VARCHAR2 PARALLEL_ENABLE
IS
v_str_in VARCHAR2(2000) := str_in;
v_invalid VARCHAR2(32767);
BEGIN
v_str_in := REPLACE(v_str_in, CHR(0));
v_str_in := REPLACE(v_str_in, CHR(1));
v_str_in := REPLACE(v_str_in, CHR(2));
v_str_in := REPLACE(v_str_in, CHR(3));
v_str_in := REPLACE(v_str_in, CHR(4));
v_str_in := REPLACE(v_str_in, CHR(5));
v_str_in := REPLACE(v_str_in, CHR(6));
v_str_in := REPLACE(v_str_in, CHR(7));
v_str_in := REPLACE(v_str_in, CHR(8));
v_str_in := REPLACE(v_str_in, CHR(10));
v_str_in := REPLACE(v_str_in, CHR(11));
v_str_in := REPLACE(v_str_in, CHR(12));
v_str_in := REPLACE(v_str_in, CHR(13));
v_str_in := REPLACE(v_str_in, CHR(14));
v_str_in := REPLACE(v_str_in, CHR(15));
v_str_in := REPLACE(v_str_in, CHR(16));
v_str_in := REPLACE(v_str_in, CHR(17));
v_str_in := REPLACE(v_str_in, CHR(18));
v_str_in := REPLACE(v_str_in, CHR(19));
v_str_in := REPLACE(v_str_in, CHR(20));
v_str_in := REPLACE(v_str_in, CHR(21));
v_str_in := REPLACE(v_str_in, CHR(22));
v_str_in := REPLACE(v_str_in, CHR(23));
v_str_in := REPLACE(v_str_in, CHR(24));
v_str_in := REPLACE(v_str_in, CHR(25));
v_str_in := REPLACE(v_str_in, CHR(26));
v_str_in := REPLACE(v_str_in, CHR(27));
v_str_in := REPLACE(v_str_in, CHR(28));
v_str_in := REPLACE(v_str_in, CHR(29));
v_str_in := REPLACE(v_str_in, CHR(30));
v_str_in := REPLACE(v_str_in, CHR(31));
RETURN v_str_in;
EXCEPTION WHEN OTHERS THEN
RETURN ASCIISTR(str_in);
END CLEAR_NONLEGAL;
CREATE OR REPLACE FUNCTION clear_nonlegal_translate(
p_str VARCHAR2
) RETURN VARCHAR2
PARALLEL_ENABLE
IS
BEGIN
RETURN REPLACE(translate(p_str,
CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
CHR(10)),
CHR(10),
NULL);
EXCEPTION
WHEN OTHERS THEN
RETURN asciistr(p_str);
END clear_nonlegal_translate;
CREATE OR REPLACE FUNCTION clear_nonlegal_man(
p_str VARCHAR2
) RETURN VARCHAR2
PARALLEL_ENABLE
IS
v_str VARCHAR2(2000);
v_char VARCHAR2(1);
v_ascii NUMBER;
BEGIN
FOR j IN 1..length(p_str) LOOP
v_char := substr(p_str, j, 1);
v_ascii := ascii(v_char);
IF v_ascii NOT BETWEEN 0 AND 31 OR v_ascii = 9 THEN
v_str := v_str || v_char;
END IF;
END LOOP;
RETURN v_str;
EXCEPTION
WHEN OTHERS THEN
RETURN asciistr(p_str);
END clear_nonlegal_man;
|
|
|
|
|
|
|
|
|
|
| Re: Replacing multiple characters [message #585170 is a reply to message #585085] |
Wed, 22 May 2013 11:20   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
Michel,
RE: REPLACE not replacing. I pushed the question and the suggestions you made up the food chain and I am still waiting for a response. So I still don't have a solution. I will be sure to post my findings in the original thread once I have something.
|
|
|
|
|
|
| Re: Replacing multiple characters [message #585172 is a reply to message #585087] |
Wed, 22 May 2013 12:05   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
The regexp_replace solution is a little faster, but for some reason regexp is about 30 times slower than REPLACE per function call in Oracle. I am assuming it has something to do with how Oracle has integrated regexp in the SQL engine. Regardless, it's still not as fast as I need.
CREATE OR REPLACE FUNCTION clear_nonlegal_regexp(
p_str VARCHAR2
) RETURN VARCHAR2
PARALLEL_ENABLE
IS
BEGIN
RETURN regexp_replace(p_str, '[^[:print:]'||chr(9)||']');
EXCEPTION
WHEN OTHERS THEN
RETURN asciistr(p_str);
END clear_nonlegal_regexp;
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal_regexp(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 3.744 seconds
|
|
|
|
| Re: Replacing multiple characters [message #585173 is a reply to message #585172] |
Wed, 22 May 2013 12:21   |
Solomon Yakobson
Messages: 3305 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Why do you need a function? A good chunk of execution time is spent on context switches caused by 75389 function calls. At least declare it as deterministic. But you should scrap it all together and use:
SELECT COUNT(DISTINCT regexp_replace(keyword_dest_url,'[^[:print:]'||chr(9)||']'))
FROM se_keywords sek
WHERE user_id = 1068
AND rownum < 100000
/
And if you want to make it faster you could try TRANSLATE:
SELECT COUNT(DISTINCT TRANSLATE(
keyword_dest_url,
'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
CHR(30) || CHR(31),
'X'
)
FROM se_keywords sek
WHERE user_id = 1068
AND rownum < 100000
/
SY.
[Updated on: Wed, 22 May 2013 12:22] Report message to a moderator
|
|
|
|
| Re: Replacing multiple characters [message #585178 is a reply to message #585171] |
Wed, 22 May 2013 14:04   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
This response began as a response to Michel, but I guess it's now for Solomon too. The thing I love about working with Oracle and technology is in general is that nothing ever stays consistent. I don't know if it's the DB engineers over at Oracle or just the increased speed and efficiency of CPUs, but the context switching is much less costly relatively then it used to be. I know I tested this whole context switching stuff years ago and it was significantly slower. Not so much anymore. The function below, clear_nonlegal_null, just returns the input string. It does nothing other than context switch. As you can see, it takes about 0.25 seconds for 100K records. It cannot legitimately be called a significant portion of the total time for the tests I have posted.
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT keyword_dest_url
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 0.188 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal_null(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 0.437 seconds
I can also test the context switching with the actual replacements I am doing. (The REPLACE(...(REPLACE( below was actually all 30 replacements when I ran it. I just put in the ellipses here to save display space.) The query below that just puts the same REPLACEs in a function. It is, as expected, slower than calling REPLACE in the SELECT, 1.2 seconds or 48% slower. I guess the point can be made that that is significant, but I can live with 1.2 seconds per 100K records. It's the 2.77 seconds of the direct call that is not acceptable. Yes, 4 seconds is even worse, but neither gets me to where I need to be. As a side note for Littlefoot, nesting the REPLACE command looks like it might be buying me about 0.8 seconds per 100K records.
Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0
Connected as aggs@AGGSTEST
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT REPLACE(...(REPLACE(keyword_dest_url, CHR(0)), ... CHR(31))
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 2.777 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT clear_nonlegal_nested(keyword_dest_url)
3 FROM se_keywords sek
4 WHERE user_id = 1068
5 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 4.025 seconds
TRANSLATE, however, is still ridiculously slow. Sure, it's roughly 1.8 seconds per 100K records faster than calling the function, but at 9.2 seconds, the difference not significant. If there is any context switching that is killing me, it seems to be the call to TRANSLATE itself.
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(keyword_dest_url,
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM se_keywords sek
9 WHERE user_id = 1068
10 AND rownum < 100000);
COUNT(*)
----------
75389
Executed in 9.22 seconds
Just a few final notes.
I don't see how declaring the function as deterministic will help me given that the input is relatively unique. It would seem like the overhead required the hash the inputs would be more costly than just doing the REPLACE again.
I understand that single runs of a query cannot, as a general rule, be used as an accurate performance test. I understand that data caching and the total load on the server can effect times. I'm not exactly new to this game. I didn't run these just once, but that was mostly out of habit. I am the only one running anything on the server and we have some massive flash drives that, in all my tests I have ever run at this gig, return data as fast as Oracle's cache. I, nevertheless, ran them each multiple times and the times were always within a couple of tenths of each respective iteration.
Lastly, even if calling built-in function in the SELECT worked fast enough, I couldn't use it anyway. There is a little more going on in the function than I have actually posted that can't be inlined. In hindsight, I should have mentioned that in the OP, but I thought it would be better not to clutter the question. I thought wrong.
I hope I haven't sounded unappreciative of all of your attempts to help. I know none of you get paid for this, and I do appreciate it. I'm just beginning to worry that there is no easy solution for my problem short of distributing the work.
|
|
|
|
|
|
| Re: Replacing multiple characters [message #585193 is a reply to message #585191] |
Wed, 22 May 2013 17:20   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
I use DISTINCT with COUNT because it forces it to process all 100K records without displaying them. Otherwise it would just process the first 100 and they all come back really fast and not really be much of a performance test. I could have just as easily used
CREATE TABLE t NOLOGGING AS
SELECT my_func(keyword_dest_url)
FROM se_keywords sek
WHERE user_id = 1068
AND rownum < 100000
which is much closer to what we actually do in production, but then I have to keep dropping the table. I'm sure there's other equally good ways to force processing of all the rows, but I find this way satisfactory. I understand the DISTINCT has some overhead, but at worst it's adding 0.188 seconds. If I use COUNT without the DISTINCT, Oracle is smart enough not to even call the function. It just counts from the index.
[Updated on: Wed, 22 May 2013 17:22] Report message to a moderator
|
|
|
|
|
|
| Re: Replacing multiple characters [message #585284 is a reply to message #585178] |
Thu, 23 May 2013 08:13   |
Solomon Yakobson
Messages: 3305 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
scottwmackey wrote on Wed, 22 May 2013 15:04TRANSLATE, however, is still ridiculously slow.
Really?
SQL> select *
2 from v$version
3 /
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
Elapsed: 00:00:00.03
SQL> SELECT COUNT(DISTINCT translate(object_name,
2 'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
3 CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
4 CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
5 CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
6 CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
7 CHR(30) || CHR(31),
8 'X'
9 )
10 ) cnt
11 FROM dba_objects
12 /
CNT
----------
47036
Elapsed: 00:00:00.49
SQL> SELECT COUNT(DISTINCT regexp_replace(object_name,'[^[:print:]'||chr(9)||']')) cnt
2 FROM dba_objects
3 /
CNT
----------
47036
Elapsed: 00:00:01.53
SQL>
SY.
|
|
|
|
| Re: Replacing multiple characters [message #585293 is a reply to message #585284] |
Thu, 23 May 2013 09:24   |
_jum
Messages: 577 Registered: February 2008
|
Senior Member |
|
|
I doubt that this example is very significant because neither TRANSLATE nor REGEXP_REPLACE will do a lot for OBJECT_NAME in this case.
SELECT count(*)
FROM dba_objects;
COUNT(*)
----------
105760
1 row selected.
SELECT COUNT(*)
FROM dba_objects
WHERE regexp_replace(object_name,'[^[:print:]'||chr(9)||']') != object_name;
COUNT(*)
----------
0
1 row selected.
|
|
|
|
|
|
| Re: Replacing multiple characters [message #585309 is a reply to message #585293] |
Thu, 23 May 2013 11:50   |
Solomon Yakobson
Messages: 3305 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
And look what happens on longer strings (btw, I added to-be-replaced characters):
SQL> SELECT COUNT(DISTINCT translate(lpad(object_name,1000,chr(mod(object_id,32))),
2 'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
3 CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
4 CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
5 CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
6 CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
7 CHR(30) || CHR(31),
8 'X'
9 )
10 ) cnt
11 FROM dba_objects
12 /
CNT
----------
48793
Elapsed: 00:00:01.23
SQL> SELECT COUNT(DISTINCT regexp_replace(lpad(object_name,1000,chr(mod(object_id,32))),'[^[:print:]'||chr(9)||']')) cnt
2 FROM dba_objects
3 /
CNT
----------
48779
Elapsed: 00:00:23.75
SQL>
SY.
|
|
|
|
| Re: Replacing multiple characters [message #585335 is a reply to message #585309] |
Thu, 23 May 2013 13:59   |
scottwmackey
Messages: 515 Registered: March 2005
|
Senior Member |
|
|
I wonder if you are using the same TRANSLATE we are using. I am seeing response times proportional to the length of the search string...
Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0
SQL> CREATE TABLE my_objects AS
2 SELECT a.object_name||b.object_name object_name
3 FROM all_objects a
4 CROSS JOIN all_objects b
5 WHERE ROWNUM <= 100000;
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 10, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
5018
Executed in 0.671 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 20, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
9074
Executed in 1.124 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 30, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
9684
Executed in 1.653 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 100, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
9684
Executed in 5.709 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 200, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
9684
Executed in 11.934 seconds
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 1000, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A')
8 FROM my_objects);
COUNT(*)
----------
9684
Executed in 55.724 seconds
and the replacement string.
SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT translate(lpad(object_name, 1000, 'x'),
3 'A'||CHR(0),
4 'A')
5 FROM my_objects);
COUNT(*)
----------
9684
Executed in 9.469 seconds
SQL>
My input string average 179 characters and take almost 9 seconds to process 100K rows.
SQL> SELECT COUNT(*) cnt,
2 ROUND(AVG(LENGTH(str))) avg_length
3 FROM (SELECT DISTINCT translate(keyword_dest_url,
4 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
5 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
6 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
7 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
8 'A') str
9 FROM se_keywords sek
10 WHERE user_id = 1068
11 AND rownum < 100000);
CNT AVG_LENGTH
---------- ----------
75389 179
Executed in 8.877 seconds So yes, TRANSLATE is ridiculously slow.
|
|
|
|
| Re: Replacing multiple characters [message #585346 is a reply to message #585335] |
Thu, 23 May 2013 17:36   |
Solomon Yakobson
Messages: 3305 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
IT is not the question of "how fast" but "how faster". If you look at my posts I compare TRANSLATE to REGEXP and TRANSLATE is faster. And all I stated was "And if you want to make it faster you could try TRANSLATE". And there is one more reason to stay away from REGEXP. I don't know why, but it doesn't take advantage of cached data. Anyway, below is run times of TRANSLATE, REGEXP_REPLACE and your CLEAR_NONLEGAL function:
SQL> SELECT *
2 FROM V$VERSION
3 /
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.
Total System Global Area 3423965184 bytes
Fixed Size 2260000 bytes
Variable Size 2734687200 bytes
Database Buffers 671088640 bytes
Redo Buffers 15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)
PL/SQL procedure successfully completed.
Elapsed: 00:00:30.21
SQL> SET TIMING ON
SQL> SELECT COUNT(
2 DISTINCT translate(lpad(object_name, 1000, 'x'),
3 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
4 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
5 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
6 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
7 'A'
8 )
9 )
10 FROM dba_objects
11 /
COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:02.46
SQL> /
COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:02.26
SQL> /
COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:00.53
SQL> /
COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:00.53
SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.
Total System Global Area 3423965184 bytes
Fixed Size 2260000 bytes
Variable Size 2734687200 bytes
Database Buffers 671088640 bytes
Redo Buffers 15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)
PL/SQL procedure successfully completed.
Elapsed: 00:00:30.21
SQL> SET TIMING ON
SQL> SELECT COUNT(DISTINCT regexp_replace(lpad(object_name, 1000, 'x'),'[^[:print:]'||chr(9)||']'))
2 FROM dba_objects
3 /
COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:10.12
SQL> /
COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:09.87
SQL> /
COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:08.31
SQL> /
COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
46514
Elapsed: 00:00:08.31
SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.
Total System Global Area 3423965184 bytes
Fixed Size 2260000 bytes
Variable Size 2734687200 bytes
Database Buffers 671088640 bytes
Redo Buffers 15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)
PL/SQL procedure successfully completed.
Elapsed: 00:00:30.23
SQL> SET TIMING ON
SQL> SELECT COUNT(DISTINCT scott.clear_nonlegal(lpad(object_name, 1000, 'x')))
2 FROM dba_objects
3 /
COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
46514
Elapsed: 00:00:05.41
SQL> /
COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
46514
Elapsed: 00:00:05.08
SQL> /
COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
46514
Elapsed: 00:00:03.52
SQL> /
COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
46514
Elapsed: 00:00:03.51
SQL>
SY.
[Updated on: Thu, 23 May 2013 18:04] Report message to a moderator
|
|
|
|
| Re: Replacing multiple characters [message #585347 is a reply to message #585346] |
Thu, 23 May 2013 18:57  |
Solomon Yakobson
Messages: 3305 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
If column keyword_dest_url is NOT NULL and must contain at least one "legal" character you could try (assuming you are on 11g) NOT NULL index virtual column:
SQL> CREATE TABLE tbl
2 AS SELECT *
3 FROM dba_objects
4 /
Table created.
SQL> ALTER TABLE TBL
2 ADD object_name_clear_legal VARCHAR2(128)
3 GENERATED ALWAYS AS (
4 translate(
5 object_name,
6 'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
7 ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
8 ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
9 ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
10 'A'
11 )
12 )
13 /
Table altered.
SQL> ALTER TABLE TBL
2 MODIFY object_name_clear_legal NOT NULL
3 /
Table altered.
SQL> CREATE INDEX tbl_idx1
2 ON tbl(object_name_clear_legal)
3 /
Index created.
SQL> explain plan for
2 SELECT COUNT(DISTINCT object_name_clear_legal)
3 FROM tbl
4 /
Explained.
SQL> @?\rdbms\admin\utlxpls
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
Plan hash value: 3929025391
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 66 | 113 (5)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 66 | | |
| 2 | VIEW | VW_DAG_0 | 66133 | 4262K| 113 (5)| 00:00:02 |
| 3 | HASH GROUP BY | | 66133 | 4262K| 113 (5)| 00:00:02 |
| 4 | INDEX FAST FULL SCAN| TBL_IDX1 | 66133 | 4262K| 109 (1)| 00:00:02 |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)
15 rows selected.
SQL> SET TIMING ON
SQL> SELECT COUNT(DISTINCT object_name_clear_legal)
2 FROM tbl
3 /
COUNT(DISTINCTOBJECT_NAME_CLEAR_LEGAL)
--------------------------------------
46515
Elapsed: 00:00:00.06
SQL>
SY.
|
|
|
|
Goto Forum:
Current Time: Wed Dec 31 10:38:15 CST 2025
|