OraFAQ Forum: SQL & PL/SQL » Replacing multiple characters

Home » SQL & PL/SQL » SQL & PL/SQL » Replacing multiple characters

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Replacing multiple characters [message #585076]

Tue, 21 May 2013 15:35

scottwmackey
Messages: 515
Registered: March 2005

Senior Member

Hi all,

I am doing some ETL that I need to run "faster". The function in which I am interested removes low ascii code characters from a string. Please see the timing below and the definitions of the of the functions below those. I am selecting just the first 100K rows for testing and timing purposes only. In production, we are doing millions of records several times a day, thus the desire for "faster". Selecting with no functions is very fast, 0.2 seconds. We would really really love to convert at least 100K rows per second. As you can see, I am getting nowhere near there. The best I can do is get it down to around five seconds using clear_nonlegal. That is, ironically, the one that I thought would be the slowest. It's making thirty-one calls to REPLACE. I would have guessed that the other two would be much faster. I am guessing that REPLACE is just much better optimized than TRANSLATE and, of course, my homegrown PL/SQL, which isn't optimized at all.

So, my question is this. Does anybody know if there is a way I can optimize my custom function, or maybe know of a better already optimized standard SQL and/or Oracle function that would do the job? I am not asking for a complete solution. I am just wondering if anybody can point me in the right direction. I am thinking about trying to use a Java stored procedure, but I have never done that before, I am not currently set up for it, and have no idea if it would be any faster anyway. Is Java faster with string manipulation the PL/SQL? I am thinking it would be really fast to call a C method, but I have no idea if that can even be done, and maybe context switching would kill me anyway. As you can see, I am pretty much out of ideas on this. Any suggestions, tips, or help would be greatly appreciated.

Scott

Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 
Connected as aggs@AGGSTEST

SQL> set timing on

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT keyword_dest_url
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 0.203 seconds


SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 4.82 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_translate(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
  
 
  COUNT(*)
----------
     75389
 
Executed in 10.905 seconds

SQL> SELECT COUNT(*)
  2    FROM (SELECT DISTINCT clear_nonlegal_man(keyword_dest_url)
  3            FROM se_keywords sek
  4           WHERE user_id = 1068
  5             AND rownum < 100000);
 
  COUNT(*)
----------
     75389
 
Executed in 8.611 seconds

SQL> SELECT ROUND(AVG(LENGTH(keyword_dest_url))) avg_len
  2    FROM se_keywords sek
  3   WHERE user_id = 1068
  4     AND rownum < 100000;
 
avg_len
-------
    172
    
    
CREATE OR REPLACE FUNCTION CLEAR_NONLEGAL(str_in IN VARCHAR2) RETURN VARCHAR2 PARALLEL_ENABLE
IS
  v_str_in        VARCHAR2(2000) := str_in;
  v_invalid       VARCHAR2(32767);

BEGIN

  v_str_in := REPLACE(v_str_in, CHR(0)); 
  v_str_in := REPLACE(v_str_in, CHR(1)); 
  v_str_in := REPLACE(v_str_in, CHR(2)); 
  v_str_in := REPLACE(v_str_in, CHR(3)); 
  v_str_in := REPLACE(v_str_in, CHR(4)); 
  v_str_in := REPLACE(v_str_in, CHR(5)); 
  v_str_in := REPLACE(v_str_in, CHR(6)); 
  v_str_in := REPLACE(v_str_in, CHR(7)); 
  v_str_in := REPLACE(v_str_in, CHR(8)); 

  v_str_in := REPLACE(v_str_in, CHR(10));
  v_str_in := REPLACE(v_str_in, CHR(11));
  v_str_in := REPLACE(v_str_in, CHR(12));
  v_str_in := REPLACE(v_str_in, CHR(13));
  v_str_in := REPLACE(v_str_in, CHR(14));
  v_str_in := REPLACE(v_str_in, CHR(15));
  v_str_in := REPLACE(v_str_in, CHR(16));
  v_str_in := REPLACE(v_str_in, CHR(17));
  v_str_in := REPLACE(v_str_in, CHR(18));
  v_str_in := REPLACE(v_str_in, CHR(19));
  v_str_in := REPLACE(v_str_in, CHR(20));
  v_str_in := REPLACE(v_str_in, CHR(21));
  v_str_in := REPLACE(v_str_in, CHR(22));
  v_str_in := REPLACE(v_str_in, CHR(23));
  v_str_in := REPLACE(v_str_in, CHR(24));
  v_str_in := REPLACE(v_str_in, CHR(25));
  v_str_in := REPLACE(v_str_in, CHR(26));
  v_str_in := REPLACE(v_str_in, CHR(27));
  v_str_in := REPLACE(v_str_in, CHR(28));
  v_str_in := REPLACE(v_str_in, CHR(29));
  v_str_in := REPLACE(v_str_in, CHR(30));
  v_str_in := REPLACE(v_str_in, CHR(31));
  
  RETURN v_str_in;

EXCEPTION WHEN OTHERS THEN
  RETURN ASCIISTR(str_in);
END CLEAR_NONLEGAL;


CREATE OR REPLACE FUNCTION clear_nonlegal_translate(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
BEGIN

  RETURN REPLACE(translate(p_str,
                           CHR(0)||CHR(1)||CHR(2)||CHR(3)||CHR(4)||CHR(5)||CHR(6)||CHR(7)||CHR(8)
                           ||CHR(11)||CHR(12)||CHR(13)||CHR(14)||CHR(15)||CHR(16)||CHR(17)||CHR(18)
                           ||CHR(19)||CHR(20)||CHR(21)||CHR(22)||CHR(23)||CHR(24)||CHR(25)||CHR(26)
                           ||CHR(27)||CHR(28)||CHR(29)||CHR(30)||CHR(31),
                           CHR(10)),
                 CHR(10),
                 NULL);

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_translate;


CREATE OR REPLACE FUNCTION clear_nonlegal_man(
  p_str VARCHAR2
) RETURN VARCHAR2
  PARALLEL_ENABLE
IS
  v_str   VARCHAR2(2000);
  v_char  VARCHAR2(1);
  v_ascii NUMBER;

BEGIN

  FOR j IN 1..length(p_str) LOOP
    v_char  := substr(p_str, j, 1);
    v_ascii := ascii(v_char);
    IF v_ascii NOT BETWEEN 0 AND 31 OR v_ascii = 9 THEN
      v_str := v_str || v_char;
    END IF;
  
  END LOOP;

  RETURN v_str;

EXCEPTION
  WHEN OTHERS THEN
    RETURN asciistr(p_str);
  
END clear_nonlegal_man;