Home » SQL & PL/SQL » SQL & PL/SQL » Functions that returns the difference between strings
Functions that returns the difference between strings [message #14300] Thu, 19 August 2004 06:36 Go to next message
Gabi
Messages: 6
Registered: August 2004
Junior Member
Hello,

I have a problem:

I need some functions or ideeas for functions which can find the differences between two strings, something like strcmp.

For the moment I am making the search char by char and replace but it is very slow.

What I need to do is some function or part which can simplfy the following code:

 for i in 1..length(self.str_agg) loop
      c_self:=substr(self.str_agg, i, 1);
      c_ctx2:=substr(ctx2.str_agg, i, 1);
      c_ini:=substr(self.str_ini, i, 1);
      if c_self != c_ctx2 then
        if c_ctx2 != c_ini then
          self.str_agg:=substr(self.STR_AGG, 1, i-1) || c_ctx2 || substr(self.STR_AGG, i+1, length(self.str_agg)-i);
        else
          self.str_agg:=substr(self.STR_AGG, 1, i-1) || c_self || substr(self.STR_AGG, i+1, length(self.str_agg)-i);
        end if;
      end if;
    end loop;

Thank you,

Bye, Gabriela.

 
Re: Functions that returns the difference between strings [message #14309 is a reply to message #14300] Thu, 19 August 2004 22:05 Go to previous messageGo to next message
Maaher
Messages: 7065
Registered: December 2001
Senior Member
strcmp in C++ works as follows, if I'm not mistaken:
it returns -1 if string1 is less than string2, 1 if string2 is less than string1 and 0 if string1 and string2 are equal. In PL/SQL, you could create a function that uses GREATEST:
CREATE OR REPLACE Function strcmp(string1 IN VARCHAR2
                                 ,string2 IN VARCHAR2
                                 )
  RETURN NUMBER
IS
BEGIN
  IF String1 = String2
    THEN RETURN 0;
  ELSIF greatest(string1,string2) = string1
    THEN RETURN 1;
  ELSIF greatest(string1,string2) = string2
    THEN RETURN -1;
  ELSE
    RETURN NULL;
  END IF;
END strcmp;
/

Function created.

SQL> SELECT ename
  2       , lag(ename) over ( order by rownum ) previous
  3       , strcmp(ename, lag(ename) over ( order by rownum )) cmp12
  4       , strcmp(lag(ename) over ( order by rownum ), ename) cmp21
  5       , strcmp(ename, ename) cmp11
  6*   FROM emp
SQL> /

ENAME      PREVIOUS       CMP12     CMP21     CMP11
---------- ---------- --------- --------- ---------
KING                                              0
JONES      KING              -1         1         0
FORD       JONES             -1         1         0
SMITH      FORD               1        -1         0
BLAKE      SMITH             -1         1         0
ALLEN      BLAKE             -1         1         0
WARD       ALLEN              1        -1         0
MARTIN     WARD              -1         1         0
CLARK      MARTIN            -1         1         0
SCOTT      CLARK              1        -1         0
TURNER     SCOTT              1        -1         0
ADAMS      TURNER            -1         1         0
JAMES      ADAMS              1        -1         0
MILLER     JAMES              1        -1         0
The LAG function is an analytic function, don't focus too hard on it. It's just used for the sake of the example. In this case, it takes the ename of the next row (ordered by ROWNUM, no real sort). In the first strcmp column, I compare the ename with the ename of the next row. In the second strcmp column, I do the inverse and in the last I compare ename with itself (should always return 0).

But why not using GREATEST() or LEAST() instead?

HTH,
MHE

MHE
Re: Functions that returns the difference between strings [message #14311 is a reply to message #14300] Thu, 19 August 2004 22:47 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA
Senior Member
Gabriela,

I am not clear about what you are looking for. Can you provide a specific example of the whole problem, rather than just a little piece of it, like some sample data and the results you want based on that data? If you are looking to compare two strings, you might be interested in the Levenshtein distance formula, which provides a numerical distance, which is the number of steps required to convert one string to the other, where each replacement, addition, or deletion of a character constitutes a step. Please see the example below.

Barbara

scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
  2    (p_source_string   IN VARCHAR2,
  3  	p_target_string   IN VARCHAR2)
  4    RETURN		     NUMBER
  5    DETERMINISTIC
  6  AS
  7    v_length_of_source    NUMBER := NVL (LENGTH (p_source_string), 0);
  8    v_length_of_target    NUMBER := NVL (LENGTH (p_target_string), 0);
  9    TYPE mytabtype IS     TABLE OF NUMBER INDEX BY BINARY_INTEGER;
 10    column_to_left	     mytabtype;
 11    current_column	     mytabtype;
 12    v_cost		     NUMBER := 0;
 13  BEGIN
 14    IF v_length_of_source = 0 THEN
 15  	 RETURN v_length_of_target;
 16    ELSIF v_length_of_target = 0 THEN
 17  	 RETURN v_length_of_source;
 18    ELSE
 19  	 FOR j IN 0 .. v_length_of_target LOOP
 20  	   column_to_left(j) := j;
 21  	 END LOOP;
 22  	 FOR i IN 1.. v_length_of_source LOOP
 23  	   current_column(0) := i;
 24  	   FOR j IN 1 .. v_length_of_target LOOP
 25  	     IF SUBSTR (p_source_string, i, 1) =
 26  		SUBSTR (p_target_string, j, 1)
 27  	     THEN v_cost := 0;
 28  	     ELSE v_cost := 1;
 29  	     END IF;
 30  	     current_column(j) := LEAST (current_column(j-1) + 1,
 31  					 column_to_left(j) + 1,
 32  					 column_to_left(j-1) + v_cost);
 33  	   END LOOP;
 34  	   FOR j IN 0 .. v_length_of_target  LOOP
 35  	     column_to_left(j) := current_column(j);
 36  	   END LOOP;
 37  	 END LOOP;
 38    END IF;
 39    RETURN current_column(v_length_of_target);
 40  END ld;
 41  /

Function created.

scott@ORA92> select ename, ld ('SMYTHE', ename) as lev_dist from emp order by lev_dist
  2  /

ENAME        LEV_DIST
---------- ----------
SMITH               2
SCOTT               4
MARTIN              5
BLAKE               5
MILLER              5
ALLEN               6
TURNER              6
JAMES               6
FORD                6
ADAMS               6
KING                6
CLARK               6
WARD                6
JONES               6

14 rows selected.

scott@ORA92> -- closest match:
scott@ORA92> select ename, lev_dist from
  2  (select ename, ld ('SMYTHE', ename) as lev_dist from emp order by ld ('SMYTHE', ename))
  3  where rownum = 1
  4  /

ENAME        LEV_DIST
---------- ----------
SMITH               2
Re: Functions that returns the difference between strings [message #14320 is a reply to message #14311] Fri, 20 August 2004 07:45 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA
Senior Member
The following takes it one step further and creates a user-defined aggregate function to return the closest match. Where there is a tie, it may return either value.

scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
  2    (p_source_string   IN VARCHAR2,
  3  	p_target_string   IN VARCHAR2)
  4    RETURN		     NUMBER
  5    DETERMINISTIC
  6  AS
  7    v_length_of_source    NUMBER := NVL (LENGTH (p_source_string), 0);
  8    v_length_of_target    NUMBER := NVL (LENGTH (p_target_string), 0);
  9    TYPE mytabtype IS     TABLE OF NUMBER INDEX BY BINARY_INTEGER;
 10    column_to_left	     mytabtype;
 11    current_column	     mytabtype;
 12    v_cost		     NUMBER := 0;
 13  BEGIN
 14    IF v_length_of_source = 0 THEN
 15  	 RETURN v_length_of_target;
 16    ELSIF v_length_of_target = 0 THEN
 17  	 RETURN v_length_of_source;
 18    ELSE
 19  	 FOR j IN 0 .. v_length_of_target LOOP
 20  	   column_to_left(j) := j;
 21  	 END LOOP;
 22  	 FOR i IN 1.. v_length_of_source LOOP
 23  	   current_column(0) := i;
 24  	   FOR j IN 1 .. v_length_of_target LOOP
 25  	     IF SUBSTR (p_source_string, i, 1) =
 26  		SUBSTR (p_target_string, j, 1)
 27  	     THEN v_cost := 0;
 28  	     ELSE v_cost := 1;
 29  	     END IF;
 30  	     current_column(j) := LEAST (current_column(j-1) + 1,
 31  					 column_to_left(j) + 1,
 32  					 column_to_left(j-1) + v_cost);
 33  	   END LOOP;
 34  	   FOR j IN 0 .. v_length_of_target  LOOP
 35  	     column_to_left(j) := current_column(j);
 36  	   END LOOP;
 37  	 END LOOP;
 38    END IF;
 39    RETURN current_column(v_length_of_target);
 40  END ld;
 41  /

Function created.

scott@ORA92> create or replace type match_input as object
  2    (str_value	      VARCHAR2(4000),
  3  	str_tocompare	      VARCHAR2(4000));
  4  /

Type created.

scott@ORA92> show errors
No errors.
scott@ORA92> create or replace type match_T as object
  2    (str_agg 	      VARCHAR2(4000),
  3  	lev_dist	      number,
  4  	min_val 	      number,
  5  	static function ODCIAggregateInitialize
  6  	  (sctx        IN OUT match_T)
  7  	  return	      number,
  8  	member function ODCIAggregateIterate
  9  	  (self        IN OUT match_T,
 10  	   value       IN     match_input)
 11  	  return	      number,
 12  	member function ODCIAggregateTerminate
 13  	  (self        IN     match_T,
 14  	   returnValue OUT    varchar2,
 15  	   flags       IN     number)
 16  	  return	      number,
 17  	member function ODCIAggregateMerge
 18  	  (self        IN OUT match_T,
 19  	   ctx2        IN     match_T)
 20  	  return	      number);
 21  /

Type created.

scott@ORA92> show errors
No errors.
scott@ORA92> create or replace type body match_T is
  2    static function ODCIAggregateInitialize
  3  	 (sctx	       IN OUT match_T)
  4  	 return 	      number
  5    is
  6    begin
  7  	 sctx := match_T (null, null, null);
  8  	 return ODCIConst.Success;
  9    end;
 10  
 11    member function ODCIAggregateIterate
 12  	 (self	       IN OUT match_T,
 13  	  value        IN     match_input)
 14  	 return 	      number
 15    is
 16    begin
 17  	 if self.str_agg is null then
 18  	   self.STR_AGG := value.str_value;
 19  	   self.lev_dist := ld (value.STR_VALUE, value.str_tocompare);
 20  	   self.min_val := ld (value.STR_VALUE, value.str_tocompare);
 21  	 end if;
 22  
 23  	 if ld (value.str_value, value.str_tocompare)
 24  	   < self.min_val then
 25  	   self.str_agg := value.str_value;
 26  	   self.min_val := ld (value.STR_VALUE, value.str_tocompare);
 27  	 end if;
 28  	 return ODCIConst.Success;
 29    end;
 30  
 31    member function ODCIAggregateTerminate
 32  	 (self	       IN     match_T,
 33  	  returnValue  OUT    varchar2,
 34  	  flags        IN     number)
 35  	 return 	      number
 36    is
 37    begin
 38  	 returnValue := STR_AGG;
 39  	 return ODCIConst.Success;
 40    end;
 41  
 42    member function ODCIAggregateMerge
 43  	 (self	       IN OUT match_T,
 44  	  ctx2	       IN     match_T)
 45  	 return 	      number
 46    is
 47  	 c_self 	      varchar2(1);
 48  	 c_ctx2 	      varchar2(1);
 49  	 c_ini		      varchar2(1);
 50  	 i		      integer;
 51    begin
 52  	 if ctx2.lev_dist < self.lev_dist then
 53  	   self.str_agg := ctx2.str_agg;
 54  	 end if;
 55  	 return ODCIConst.Success;
 56    end;
 57  end;
 58  /

Type body created.

scott@ORA92> show errors
No errors.
scott@ORA92> create or replace function closest_match
  2    (input match_input)
  3    return varchar2
  4  parallel_enable aggregate using match_t;
  5  /

Function created.

scott@ORA92> show errors
No errors.
scott@ORA92> -- closest match:
scott@ORA92> select closest_match (match_input (ename, 'SMYTHE')) from emp
  2  /

CLOSEST_MATCH(MATCH_INPUT(ENAME,'SMYTHE'))
----------------------------------------------------------------------------------------------------
SMITH

scott@ORA92> -- closest match in each department:
scott@ORA92> column name_to_compare format a30
scott@ORA92> column closest_match format a30
scott@ORA92> select deptno,
  2  	    'SMYTHE' as name_to_compare,
  3  	    closest_match (match_input (ename, 'SMYTHE')) as closest_match
  4  from   emp
  5  group  by deptno
  6  /

    DEPTNO NAME_TO_COMPARE                CLOSEST_MATCH
---------- ------------------------------ ------------------------------
        10 SMYTHE                         MILLER
        20 SMYTHE                         SMITH
        30 SMYTHE                         BLAKE
Re: Functions that returns the difference between strings [message #14321 is a reply to message #14320] Fri, 20 August 2004 08:17 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA
Senior Member
The user-defined aggregate function that I posted for finding the closest match does not return ties and may return either matching value. The following is another way to return the closest match within each department, including ties. You could also use this method to return a limited set of matches, limited by either the dense_rank value or lev_dist value. I am still just guessing at what your ultimate goal may be, so I hope some of this is useful to you.

scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
  2    (p_source_string   IN VARCHAR2,
  3  	p_target_string   IN VARCHAR2)
  4    RETURN		     NUMBER
  5    DETERMINISTIC
  6  AS
  7    v_length_of_source    NUMBER := NVL (LENGTH (p_source_string), 0);
  8    v_length_of_target    NUMBER := NVL (LENGTH (p_target_string), 0);
  9    TYPE mytabtype IS     TABLE OF NUMBER INDEX BY BINARY_INTEGER;
 10    column_to_left	     mytabtype;
 11    current_column	     mytabtype;
 12    v_cost		     NUMBER := 0;
 13  BEGIN
 14    IF v_length_of_source = 0 THEN
 15  	 RETURN v_length_of_target;
 16    ELSIF v_length_of_target = 0 THEN
 17  	 RETURN v_length_of_source;
 18    ELSE
 19  	 FOR j IN 0 .. v_length_of_target LOOP
 20  	   column_to_left(j) := j;
 21  	 END LOOP;
 22  	 FOR i IN 1.. v_length_of_source LOOP
 23  	   current_column(0) := i;
 24  	   FOR j IN 1 .. v_length_of_target LOOP
 25  	     IF SUBSTR (p_source_string, i, 1) =
 26  		SUBSTR (p_target_string, j, 1)
 27  	     THEN v_cost := 0;
 28  	     ELSE v_cost := 1;
 29  	     END IF;
 30  	     current_column(j) := LEAST (current_column(j-1) + 1,
 31  					 column_to_left(j) + 1,
 32  					 column_to_left(j-1) + v_cost);
 33  	   END LOOP;
 34  	   FOR j IN 0 .. v_length_of_target  LOOP
 35  	     column_to_left(j) := current_column(j);
 36  	   END LOOP;
 37  	 END LOOP;
 38    END IF;
 39    RETURN current_column(v_length_of_target);
 40  END ld;
 41  /

Function created.

scott@ORA92> column name_to_compare format a30
scott@ORA92> column closest_match format a30
scott@ORA92> select deptno, 'SMYTHE' as name_to_compare, ename as closest_match
  2  from   (select deptno, ename,
  3  		    dense_rank () over
  4  		    (partition by deptno
  5  		     order     by lev_dist) as dr
  6  	     from   (select deptno, ename, ld (ename, 'SMYTHE') as lev_dist
  7  		     from   emp))
  8  where  dr = 1
  9  /

    DEPTNO NAME_TO_COMPARE                CLOSEST_MATCH
---------- ------------------------------ ------------------------------
        10 SMYTHE                         MILLER
        20 SMYTHE                         SMITH
        30 SMYTHE                         MARTIN
        30 SMYTHE                         BLAKE
Re: Functions that returns the difference between strings [message #14322 is a reply to message #14321] Fri, 20 August 2004 10:17 Go to previous messageGo to next message
Art Metzer
Messages: 2480
Registered: December 2002
Senior Member
Barbara,

This aggregate function is so cool! You should write a white paper on it, post it on MetaLink, something....very cool.

Art.
Re: Functions that returns the difference between strings [message #14328 is a reply to message #14322] Fri, 20 August 2004 22:28 Go to previous message
Barbara Boehmer
Messages: 9106
Registered: November 2002
Location: California, USA
Senior Member
Thanks Art. I just figured out that I can also use it like an analytic function too:

scott@ORA92> column closest_match_in_deptno format a23
scott@ORA92> select deptno,
  2  	    'SMYTHE' as name_to_compare,
  3  	    ename,
  4  	    closest_match (match_input (ename, 'SMYTHE')) over
  5  	      (partition by deptno) as closest_match_in_deptno
  6  from   emp
  7  /

    DEPTNO NAME_TO_COMPARE                ENAME      CLOSEST_MATCH_IN_DEPTNO
---------- ------------------------------ ---------- -----------------------
        10 SMYTHE                         CLARK      MILLER
        10 SMYTHE                         KING       MILLER
        10 SMYTHE                         MILLER     MILLER
        20 SMYTHE                         SMITH      SMITH
        20 SMYTHE                         ADAMS      SMITH
        20 SMYTHE                         FORD       SMITH
        20 SMYTHE                         SCOTT      SMITH
        20 SMYTHE                         JONES      SMITH
        30 SMYTHE                         ALLEN      BLAKE
        30 SMYTHE                         BLAKE      BLAKE
        30 SMYTHE                         MARTIN     BLAKE
        30 SMYTHE                         JAMES      BLAKE
        30 SMYTHE                         TURNER     BLAKE
        30 SMYTHE                         WARD       BLAKE

14 rows selected.
Previous Topic: Out param in Stored procedure CLOB and Long
Next Topic: Self-join a GOOD EXPLANATION required
Goto Forum:
  


Current Time: Sat Sep 06 13:16:48 CDT 2025