Home » SQL & PL/SQL » SQL & PL/SQL » Replacing multiple characters
Replacing multiple characters [message #585076] Tue, 21 May 2013 15:35 Go to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
Hi all,

I am doing some ETL that I need to run "faster". The function in which I am interested removes low ascii code characters from a string. Please see the timing below and the definitions of the of the functions below those. I am selecting just the first 100K rows for testing and timing purposes only. In production, we are doing millions of records several times a day, thus the desire for "faster". Selecting with no functions is very fast, 0.2 seconds. We would really really love to convert at least 100K rows per second. As you can see, I am getting nowhere near there. The best I can do is get it down to around five seconds using clear_nonlegal. That is, ironically, the one that I thought would be the slowest. It's making thirty-one calls to REPLACE. I would have guessed that the other two would be much faster. I am guessing that REPLACE is just much better optimized than TRANSLATE and, of course, my homegrown PL/SQL, which isn't optimized at all.

So, my question is this. Does anybody know if there is a way I can optimize my custom function, or maybe know of a better already optimized standard SQL and/or Oracle function that would do the job? I am not asking for a complete solution. I am just wondering if anybody can point me in the right direction. I am thinking about trying to use a Java stored procedure, but I have never done that before, I am not currently set up for it, and have no idea if it would be any faster anyway. Is Java faster with string manipulation the PL/SQL? I am thinking it would be really fast to call a C method, but I have no idea if that can even be done, and maybe context switching would kill me anyway. As you can see, I am pretty much out of ideas on this. Any suggestions, tips, or help would be greatly appreciated.

Scott

Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 
Connected as aggs@AGGSTEST

SQL> set timing on

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT keyword_dest_url
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 0.203 seconds


SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 4.82 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_translate(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
  
 
  COUNT(*)
----------
     75389
 
Executed in 10.905 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_man(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 8.611 seconds

SQL> SELECT ROUND(AVG(LENGTH(keyword_dest_url))) avg_len
  2    FROM se_keywords sek
  3   WHERE user_id = 1068
  4     AND rownum < 100000;
 
avg_len
-------
    172
    
    
CREATE OR REPLACE FUNCTION CLEAR_NONLEGAL(str_in IN VARCHAR2) RETURN VARCHAR2 PARALLEL_ENABLE
IS
  v_str_in        VARCHAR2(2000) := str_in;
  v_invalid       VARCHAR2(32767);

BEGIN

  v_str_in := REPLACE(v_str_in, CHR(0)); 
  v_str_in := REPLACE(v_str_in, CHR(1)); 
  v_str_in := REPLACE(v_str_in, CHR(2)); 
  v_str_in := REPLACE(v_str_in, CHR(3)); 
  v_str_in := REPLACE(v_str_in, CHR(4)); 
  v_str_in := REPLACE(v_str_in, CHR(5)); 
  v_str_in := REPLACE(v_str_in, CHR(6)); 
  v_str_in := REPLACE(v_str_in, CHR(7)); 
  v_str_in := REPLACE(v_str_in, CHR(8)); 

  v_str_in := REPLACE(v_str_in, CHR(10));
  v_str_in := REPLACE(v_str_in, CHR(11));
  v_str_in := REPLACE(v_str_in, CHR(12));
  v_str_in := REPLACE(v_str_in, CHR(13));
  v_str_in := REPLACE(v_str_in, CHR(14));
  v_str_in := REPLACE(v_str_in, CHR(15));
  v_str_in := REPLACE(v_str_in, CHR(16));
  v_str_in := REPLACE(v_str_in, CHR(17));
  v_str_in := REPLACE(v_str_in, CHR(18));
  v_str_in := REPLACE(v_str_in, CHR(19));
  v_str_in := REPLACE(v_str_in, CHR(20));
  v_str_in := REPLACE(v_str_in, CHR(21));
  v_str_in := REPLACE(v_str_in, CHR(22));
  v_str_in := REPLACE(v_str_in, CHR(23));
  v_str_in := REPLACE(v_str_in, CHR(24));
  v_str_in := REPLACE(v_str_in, CHR(25));
  v_str_in := REPLACE(v_str_in, CHR(26));
  v_str_in := REPLACE(v_str_in, CHR(27));
  v_str_in := REPLACE(v_str_in, CHR(28));
  v_str_in := REPLACE(v_str_in, CHR(29));
  v_str_in := REPLACE(v_str_in, CHR(30));
  v_str_in := REPLACE(v_str_in, CHR(31));
  
  RETURN v_str_in;

EXCEPTION WHEN OTHERS THEN
  RETURN ASCIISTR(str_in);
END CLEAR_NONLEGAL;


CREATE OR REPLACE FUNCTION clear_nonlegal_translate(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
BEGIN

  RETURN REPLACE(translate(p_str,
                           CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
                           ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
                           ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
                           ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
                           CHR(10)),
                 CHR(10),
                 NULL);

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_translate;


CREATE OR REPLACE FUNCTION clear_nonlegal_man(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
  v_str   VARCHAR2(2000);
  v_char  VARCHAR2(1);
  v_ascii NUMBER;

BEGIN

  FOR j IN 1..length(p_str) LOOP
    v_char  := substr(p_str, j, 1);
    v_ascii := ascii(v_char);
    IF v_ascii NOT BETWEEN 0 AND 31 OR v_ascii = 9 THEN
      v_str := v_str || v_char;
    END IF;
  
  END LOOP;

  RETURN v_str;

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_man;
Re: Replacing multiple characters [message #585085 is a reply to message #585076] Wed, 22 May 2013 00:12 Go to previous messageGo to next message
Michel Cadot
Messages: 58639
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
What is the solution or what was the problem of your issue at REPLACE not replacing?

We are waiting for your feedback.

Regards
Michel
Re: Replacing multiple characters [message #585087 is a reply to message #585076] Wed, 22 May 2013 00:15 Go to previous messageGo to next message
spacebar
Messages: 26
Registered: February 2007
Junior Member
Try it using only one regexp_replace:
v_str_in := regexp_replace( v_str_in, '[[:cntrl:]]', null,,0 );
Re: Replacing multiple characters [message #585091 is a reply to message #585076] Wed, 22 May 2013 00:45 Go to previous messageGo to next message
Littlefoot
Messages: 19346
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
Would nested REPLACE do any good? Such as (the first 3 replacements):
select replace (replace (replace (keyword_dest_url, chr (0)), chr (1)), chr(2))
from se_keywords sek
where user_id = 1068 
  and rownum < 100000;
Re: Replacing multiple characters [message #585170 is a reply to message #585085] Wed, 22 May 2013 11:20 Go to previous messageGo to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
Michel,

RE: REPLACE not replacing. I pushed the question and the suggestions you made up the food chain and I am still waiting for a response. So I still don't have a solution. I will be sure to post my findings in the original thread once I have something.
Re: Replacing multiple characters [message #585171 is a reply to message #585170] Wed, 22 May 2013 11:47 Go to previous messageGo to next message
Michel Cadot
Messages: 58639
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
OK, thanks.

You can optimize one step further your function:
CREATE OR REPLACE FUNCTION clear_nonlegal_translate(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
BEGIN

  RETURN translate(p_str,
                   'A'
                   ||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
                   ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
                   ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
                   ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31)||CHR(10)
                  ,'A');

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_translate;

And I bet you'll have better performances putting the expression directly in the query.
Also a test is valid if you make it several times (say 10 times), remove the lowest and highest numbers for each case and take the average. The simple fact that cache is not the same can explain the differences in the performances you see on a single test.

Regards
Michel
Re: Replacing multiple characters [message #585172 is a reply to message #585087] Wed, 22 May 2013 12:05 Go to previous messageGo to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
The regexp_replace solution is a little faster, but for some reason regexp is about 30 times slower than REPLACE per function call in Oracle. I am assuming it has something to do with how Oracle has integrated regexp in the SQL engine. Regardless, it's still not as fast as I need.

CREATE OR REPLACE FUNCTION clear_nonlegal_regexp(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
BEGIN

  RETURN regexp_replace(p_str, '[^[:print:]'||chr(9)||']');

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_regexp;

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_regexp(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 3.744 seconds
Re: Replacing multiple characters [message #585173 is a reply to message #585172] Wed, 22 May 2013 12:21 Go to previous messageGo to next message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
Why do you need a function? A good chunk of execution time is spent on context switches caused by 75389 function calls. At least declare it as deterministic. But you should scrap it all together and use:

SELECT  COUNT(DISTINCT regexp_replace(keyword_dest_url,'[^[:print:]'||chr(9)||']'))
  FROM  se_keywords sek
  WHERE user_id = 1068
    AND rownum < 100000
/


And if you want to make it faster you could try TRANSLATE:

SELECT  COUNT(DISTINCT TRANSLATE(
                                 keyword_dest_url,
                                 'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
                                 CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
                                 CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
                                 CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
                                 CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
                                 CHR(30) || CHR(31),
                                 'X'
                                )
  FROM  se_keywords sek
  WHERE user_id = 1068
    AND rownum < 100000
/


SY.

[Updated on: Wed, 22 May 2013 12:22]

Report message to a moderator

Re: Replacing multiple characters [message #585178 is a reply to message #585171] Wed, 22 May 2013 14:04 Go to previous messageGo to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
This response began as a response to Michel, but I guess it's now for Solomon too. The thing I love about working with Oracle and technology is in general is that nothing ever stays consistent. I don't know if it's the DB engineers over at Oracle or just the increased speed and efficiency of CPUs, but the context switching is much less costly relatively then it used to be. I know I tested this whole context switching stuff years ago and it was significantly slower. Not so much anymore. The function below, clear_nonlegal_null, just returns the input string. It does nothing other than context switch. As you can see, it takes about 0.25 seconds for 100K records. It cannot legitimately be called a significant portion of the total time for the tests I have posted.

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT keyword_dest_url
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 0.188 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_null(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 0.437 seconds


I can also test the context switching with the actual replacements I am doing. (The REPLACE(...(REPLACE( below was actually all 30 replacements when I ran it. I just put in the ellipses here to save display space.) The query below that just puts the same REPLACEs in a function. It is, as expected, slower than calling REPLACE in the SELECT, 1.2 seconds or 48% slower. I guess the point can be made that that is significant, but I can live with 1.2 seconds per 100K records. It's the 2.77 seconds of the direct call that is not acceptable. Yes, 4 seconds is even worse, but neither gets me to where I need to be. As a side note for Littlefoot, nesting the REPLACE command looks like it might be buying me about 0.8 seconds per 100K records.

Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 
Connected as aggs@AGGSTEST
 
SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT REPLACE(...(REPLACE(keyword_dest_url, CHR(0)), ... CHR(31))
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 2.777 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_nested(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 4.025 seconds


TRANSLATE, however, is still ridiculously slow. Sure, it's roughly 1.8 seconds per 100K records faster than calling the function, but at 9.2 seconds, the difference not significant. If there is any context switching that is killing me, it seems to be the call to TRANSLATE itself.

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(keyword_dest_url,
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM se_keywords sek
  9           WHERE user_id = 1068
 10             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 9.22 seconds
 


Just a few final notes.

I don't see how declaring the function as deterministic will help me given that the input is relatively unique. It would seem like the overhead required the hash the inputs would be more costly than just doing the REPLACE again.

I understand that single runs of a query cannot, as a general rule, be used as an accurate performance test. I understand that data caching and the total load on the server can effect times. I'm not exactly new to this game. I didn't run these just once, but that was mostly out of habit. I am the only one running anything on the server and we have some massive flash drives that, in all my tests I have ever run at this gig, return data as fast as Oracle's cache. I, nevertheless, ran them each multiple times and the times were always within a couple of tenths of each respective iteration.

Lastly, even if calling built-in function in the SELECT worked fast enough, I couldn't use it anyway. There is a little more going on in the function than I have actually posted that can't be inlined. In hindsight, I should have mentioned that in the OP, but I thought it would be better not to clutter the question. I thought wrong.

I hope I haven't sounded unappreciative of all of your attempts to help. I know none of you get paid for this, and I do appreciate it. I'm just beginning to worry that there is no easy solution for my problem short of distributing the work.
Re: Replacing multiple characters [message #585191 is a reply to message #585178] Wed, 22 May 2013 15:50 Go to previous messageGo to next message
Littlefoot
Messages: 19346
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
It probably doesn't matter (or I'm not reading it correctly), but - why do you use DISTINCT? Do you have to use it?
Re: Replacing multiple characters [message #585193 is a reply to message #585191] Wed, 22 May 2013 17:20 Go to previous messageGo to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
I use DISTINCT with COUNT because it forces it to process all 100K records without displaying them. Otherwise it would just process the first 100 and they all come back really fast and not really be much of a performance test. I could have just as easily used

CREATE TABLE t NOLOGGING AS
SELECT my_func(keyword_dest_url)
  FROM se_keywords sek
 WHERE user_id = 1068
   AND rownum < 100000


which is much closer to what we actually do in production, but then I have to keep dropping the table. I'm sure there's other equally good ways to force processing of all the rows, but I find this way satisfactory. I understand the DISTINCT has some overhead, but at worst it's adding 0.188 seconds. If I use COUNT without the DISTINCT, Oracle is smart enough not to even call the function. It just counts from the index.

[Updated on: Wed, 22 May 2013 17:22]

Report message to a moderator

Re: Replacing multiple characters [message #585207 is a reply to message #585193] Thu, 23 May 2013 00:17 Go to previous messageGo to next message
Littlefoot
Messages: 19346
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
OK; I just thought that it might make a difference when working with millions of records.
Re: Replacing multiple characters [message #585284 is a reply to message #585178] Thu, 23 May 2013 08:13 Go to previous messageGo to next message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
scottwmackey wrote on Wed, 22 May 2013 15:04
TRANSLATE, however, is still ridiculously slow.


Really?

SQL> select  *
  2    from  v$version
  3  /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

Elapsed: 00:00:00.03
SQL> SELECT  COUNT(DISTINCT translate(object_name,
  2                                   'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
  3                                   CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
  4                                   CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
  5                                   CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
  6                                   CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
  7                                   CHR(30) || CHR(31),
  8                                   'X'
  9                                  )
 10                ) cnt
 11    FROM  dba_objects
 12  /

       CNT
----------
     47036

Elapsed: 00:00:00.49
SQL> SELECT  COUNT(DISTINCT regexp_replace(object_name,'[^[:print:]'||chr(9)||']')) cnt
  2    FROM  dba_objects
  3  /

       CNT
----------
     47036

Elapsed: 00:00:01.53
SQL> 


SY.
Re: Replacing multiple characters [message #585293 is a reply to message #585284] Thu, 23 May 2013 09:24 Go to previous messageGo to next message
_jum
Messages: 488
Registered: February 2008
Senior Member
I doubt that this example is very significant because neither TRANSLATE nor REGEXP_REPLACE will do a lot for OBJECT_NAME in this case.
SELECT count(*) 
      FROM  dba_objects;

  COUNT(*)
----------
    105760
1 row selected.

SELECT  COUNT(*)
  FROM  dba_objects
 WHERE  regexp_replace(object_name,'[^[:print:]'||chr(9)||']') != object_name;     

  COUNT(*)
----------
         0
1 row selected.

Re: Replacing multiple characters [message #585306 is a reply to message #585293] Thu, 23 May 2013 11:37 Go to previous messageGo to next message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
It doesn't matter how many characters are replaced. Both TRANSLATE and REGEXP must go over every character in object_name in order to make that decision.

SY.
Re: Replacing multiple characters [message #585309 is a reply to message #585293] Thu, 23 May 2013 11:50 Go to previous messageGo to next message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
And look what happens on longer strings (btw, I added to-be-replaced characters):

SQL> SELECT  COUNT(DISTINCT translate(lpad(object_name,1000,chr(mod(object_id,32))),
  2                                   'X' || CHR(0) || CHR(1) || CHR(2) || CHR(3) || CHR(4) || CHR(5) ||
  3                                   CHR(6) || CHR(7) || CHR(8) || CHR(9) || CHR(10) || CHR(11) ||
  4                                   CHR(12) || CHR(13) || CHR(14) || CHR(15) || CHR(16) || CHR(17) ||
  5                                   CHR(18) || CHR(19) || CHR(20) || CHR(21) || CHR(21) || CHR(23) ||
  6                                   CHR(24) || CHR(25) || CHR(26) || CHR(27) || CHR(28) || CHR(29) ||
  7                                   CHR(30) || CHR(31),
  8                                   'X'
  9                                  )
 10                ) cnt
 11    FROM  dba_objects
 12  /

       CNT
----------
     48793

Elapsed: 00:00:01.23
SQL> SELECT  COUNT(DISTINCT regexp_replace(lpad(object_name,1000,chr(mod(object_id,32))),'[^[:print:]'||chr(9)||']')) cnt
  2    FROM  dba_objects
  3  /

       CNT
----------
     48779

Elapsed: 00:00:23.75
SQL> 


SY.
Re: Replacing multiple characters [message #585335 is a reply to message #585309] Thu, 23 May 2013 13:59 Go to previous messageGo to next message
scottwmackey
Messages: 501
Registered: March 2005
Senior Member
I wonder if you are using the same TRANSLATE we are using. I am seeing response times proportional to the length of the search string...
Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 
 
SQL> CREATE TABLE my_objects AS
  2  SELECT a.object_name||b.object_name object_name
  3    FROM  all_objects a
  4   CROSS JOIN all_objects b
  5   WHERE ROWNUM <= 100000;
 

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 10, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      5018
 
Executed in 0.671 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 20, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      9074
 
Executed in 1.124 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 30, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      9684
 
Executed in 1.653 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 100, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      9684
 
Executed in 5.709 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 200, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      9684
 
Executed in 11.934 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 1000, 'x'),
  3                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                    'A')
  8            FROM my_objects);
 
  COUNT(*)
----------
      9684
 
Executed in 55.724 seconds

and the replacement string.
SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT translate(lpad(object_name, 1000, 'x'),
  3                                    'A'||CHR(0),
  4                                    'A')
  5            FROM my_objects);
 
  COUNT(*)
----------
      9684
 
Executed in 9.469 seconds
 
SQL> 


My input string average 179 characters and take almost 9 seconds to process 100K rows.
SQL> SELECT COUNT(*) cnt,
  2         ROUND(AVG(LENGTH(str))) avg_length
  3    FROM (SELECT DISTINCT translate(keyword_dest_url,
  4                                    'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  5                                    ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  6                                    ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  7                                    ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  8                                    'A') str
  9            FROM se_keywords sek
 10           WHERE user_id = 1068
 11             AND rownum < 100000);
 
       CNT AVG_LENGTH
---------- ----------
     75389        179
 
Executed in 8.877 seconds
So yes, TRANSLATE is ridiculously slow.
Re: Replacing multiple characters [message #585346 is a reply to message #585335] Thu, 23 May 2013 17:36 Go to previous messageGo to next message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
IT is not the question of "how fast" but "how faster". If you look at my posts I compare TRANSLATE to REGEXP and TRANSLATE is faster. And all I stated was "And if you want to make it faster you could try TRANSLATE". And there is one more reason to stay away from REGEXP. I don't know why, but it doesn't take advantage of cached data. Anyway, below is run times of TRANSLATE, REGEXP_REPLACE and your CLEAR_NONLEGAL function:

SQL> SELECT  *
  2    FROM  V$VERSION
  3  /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.

Total System Global Area 3423965184 bytes
Fixed Size                  2260000 bytes
Variable Size            2734687200 bytes
Database Buffers          671088640 bytes
Redo Buffers               15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)

PL/SQL procedure successfully completed.

Elapsed: 00:00:30.21
SQL> SET TIMING ON
SQL> SELECT  COUNT(
  2                DISTINCT translate(lpad(object_name, 1000, 'x'),
  3                                   'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  4                                   ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  5                                   ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  6                                   ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
  7                                   'A'
  8                                  )
  9               )
 10    FROM  dba_objects
 11  /

COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:02.46
SQL> /

COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:02.26
SQL> /

COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:00.53
SQL> /

COUNT(DISTINCTTRANSLATE(LPAD(OBJECT_NAME,1000,'X'),'A'||CHR(0)||CHR(1)||CHR(2)||
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:00.53
SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.

Total System Global Area 3423965184 bytes
Fixed Size                  2260000 bytes
Variable Size            2734687200 bytes
Database Buffers          671088640 bytes
Redo Buffers               15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)

PL/SQL procedure successfully completed.

Elapsed: 00:00:30.21
SQL> SET TIMING ON
SQL> SELECT  COUNT(DISTINCT regexp_replace(lpad(object_name, 1000, 'x'),'[^[:print:]'||chr(9)||']'))
  2    FROM  dba_objects
  3  /

COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:10.12
SQL> /

COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:09.87
SQL> /

COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:08.31
SQL> /

COUNT(DISTINCTREGEXP_REPLACE(LPAD(OBJECT_NAME,1000,'X'),'[^[:PRINT:]'||CHR(9)||'
--------------------------------------------------------------------------------
                                                                           46514

Elapsed: 00:00:08.31
SQL> SHUTDOWN
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP
ORACLE instance started.

Total System Global Area 3423965184 bytes
Fixed Size                  2260000 bytes
Variable Size            2734687200 bytes
Database Buffers          671088640 bytes
Redo Buffers               15929344 bytes
Database mounted.
Database opened.
SQL> EXEC DBMS_LOCK.SLEEP(30)

PL/SQL procedure successfully completed.

Elapsed: 00:00:30.23
SQL> SET TIMING ON
SQL> SELECT  COUNT(DISTINCT scott.clear_nonlegal(lpad(object_name, 1000, 'x')))
  2    FROM  dba_objects
  3  /

COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
                                                          46514

Elapsed: 00:00:05.41
SQL> /

COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
                                                          46514

Elapsed: 00:00:05.08
SQL> /

COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
                                                          46514

Elapsed: 00:00:03.52
SQL> /

COUNT(DISTINCTSCOTT.CLEAR_NONLEGAL(LPAD(OBJECT_NAME,1000,'X')))
---------------------------------------------------------------
                                                          46514

Elapsed: 00:00:03.51
SQL>


SY.

[Updated on: Thu, 23 May 2013 18:04]

Report message to a moderator

Re: Replacing multiple characters [message #585347 is a reply to message #585346] Thu, 23 May 2013 18:57 Go to previous message
Solomon Yakobson
Messages: 1966
Registered: January 2010
Senior Member
If column keyword_dest_url is NOT NULL and must contain at least one "legal" character you could try (assuming you are on 11g) NOT NULL index virtual column:

SQL> CREATE TABLE tbl
  2    AS SELECT  *
  3         FROM  dba_objects
  4  /

Table created.

SQL> ALTER TABLE TBL
  2    ADD object_name_clear_legal VARCHAR2(128)
  3      GENERATED ALWAYS AS (
  4                           translate(
  5                                     object_name,
  6                                     'A'||CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
  7                                     ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
  8                                     ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
  9                                     ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
 10                                     'A'
 11                                    )
 12                          )
 13  /

Table altered.

SQL> ALTER TABLE TBL
  2    MODIFY object_name_clear_legal NOT NULL
  3  /

Table altered.

SQL> CREATE INDEX tbl_idx1
  2    ON tbl(object_name_clear_legal)
  3  /

Index created.

SQL> explain plan for
  2  SELECT  COUNT(DISTINCT object_name_clear_legal)
  3    FROM  tbl
  4  /

Explained.

SQL> @?\rdbms\admin\utlxpls

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------

Plan hash value: 3929025391

------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |    66 |   113   (5)| 00:00:02 |
|   1 |  SORT AGGREGATE         |          |     1 |    66 |            |          |
|   2 |   VIEW                  | VW_DAG_0 | 66133 |  4262K|   113   (5)| 00:00:02 |
|   3 |    HASH GROUP BY        |          | 66133 |  4262K|   113   (5)| 00:00:02 |
|   4 |     INDEX FAST FULL SCAN| TBL_IDX1 | 66133 |  4262K|   109   (1)| 00:00:02 |
------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------


Note
-----
   - dynamic sampling used for this statement (level=2)

15 rows selected.

SQL> SET TIMING ON
SQL> SELECT  COUNT(DISTINCT object_name_clear_legal)
  2    FROM  tbl
  3  /

COUNT(DISTINCTOBJECT_NAME_CLEAR_LEGAL)
--------------------------------------
                                 46515

Elapsed: 00:00:00.06
SQL>


SY.
Previous Topic: How to use window functions in where clause without inline view
Next Topic: dbms_lock.sleep
Goto Forum:
  


Current Time: Fri Aug 01 01:00:47 CDT 2014

Total time taken to generate the page: 0.06019 seconds