Home » SQL & PL/SQL » SQL & PL/SQL » Functions that returns the difference between strings
Functions that returns the difference between strings [message #14300] |
Thu, 19 August 2004 06:36  |
Gabi
Messages: 6 Registered: August 2004
|
Junior Member |
|
|
Hello,
I have a problem:
I need some functions or ideeas for functions which can find the differences between two strings, something like strcmp.
For the moment I am making the search char by char and replace but it is very slow.
What I need to do is some function or part which can simplfy the following code:
for i in 1..length(self.str_agg) loop
c_self:=substr(self.str_agg, i, 1);
c_ctx2:=substr(ctx2.str_agg, i, 1);
c_ini:=substr(self.str_ini, i, 1);
if c_self != c_ctx2 then
if c_ctx2 != c_ini then
self.str_agg:=substr(self.STR_AGG, 1, i-1) || c_ctx2 || substr(self.STR_AGG, i+1, length(self.str_agg)-i);
else
self.str_agg:=substr(self.STR_AGG, 1, i-1) || c_self || substr(self.STR_AGG, i+1, length(self.str_agg)-i);
end if;
end if;
end loop;
Thank you,
Bye, Gabriela.
|
|
|
Re: Functions that returns the difference between strings [message #14309 is a reply to message #14300] |
Thu, 19 August 2004 22:05   |
 |
Maaher
Messages: 7065 Registered: December 2001
|
Senior Member |
|
|
strcmp in C++ works as follows, if I'm not mistaken:
it returns -1 if string1 is less than string2, 1 if string2 is less than string1 and 0 if string1 and string2 are equal. In PL/SQL, you could create a function that uses GREATEST:CREATE OR REPLACE Function strcmp(string1 IN VARCHAR2
,string2 IN VARCHAR2
)
RETURN NUMBER
IS
BEGIN
IF String1 = String2
THEN RETURN 0;
ELSIF greatest(string1,string2) = string1
THEN RETURN 1;
ELSIF greatest(string1,string2) = string2
THEN RETURN -1;
ELSE
RETURN NULL;
END IF;
END strcmp;
/
Function created.
SQL> SELECT ename
2 , lag(ename) over ( order by rownum ) previous
3 , strcmp(ename, lag(ename) over ( order by rownum )) cmp12
4 , strcmp(lag(ename) over ( order by rownum ), ename) cmp21
5 , strcmp(ename, ename) cmp11
6* FROM emp
SQL> /
ENAME PREVIOUS CMP12 CMP21 CMP11
---------- ---------- --------- --------- ---------
KING 0
JONES KING -1 1 0
FORD JONES -1 1 0
SMITH FORD 1 -1 0
BLAKE SMITH -1 1 0
ALLEN BLAKE -1 1 0
WARD ALLEN 1 -1 0
MARTIN WARD -1 1 0
CLARK MARTIN -1 1 0
SCOTT CLARK 1 -1 0
TURNER SCOTT 1 -1 0
ADAMS TURNER -1 1 0
JAMES ADAMS 1 -1 0
MILLER JAMES 1 -1 0
The LAG function is an analytic function, don't focus too hard on it. It's just used for the sake of the example. In this case, it takes the ename of the next row (ordered by ROWNUM, no real sort). In the first strcmp column, I compare the ename with the ename of the next row. In the second strcmp column, I do the inverse and in the last I compare ename with itself (should always return 0).
But why not using GREATEST() or LEAST() instead?
HTH,
MHE
MHE
|
|
|
Re: Functions that returns the difference between strings [message #14311 is a reply to message #14300] |
Thu, 19 August 2004 22:47   |
 |
Barbara Boehmer
Messages: 9106 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Gabriela,
I am not clear about what you are looking for. Can you provide a specific example of the whole problem, rather than just a little piece of it, like some sample data and the results you want based on that data? If you are looking to compare two strings, you might be interested in the Levenshtein distance formula, which provides a numerical distance, which is the number of steps required to convert one string to the other, where each replacement, addition, or deletion of a character constitutes a step. Please see the example below.
Barbara
scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
2 (p_source_string IN VARCHAR2,
3 p_target_string IN VARCHAR2)
4 RETURN NUMBER
5 DETERMINISTIC
6 AS
7 v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
8 v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
9 TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
10 column_to_left mytabtype;
11 current_column mytabtype;
12 v_cost NUMBER := 0;
13 BEGIN
14 IF v_length_of_source = 0 THEN
15 RETURN v_length_of_target;
16 ELSIF v_length_of_target = 0 THEN
17 RETURN v_length_of_source;
18 ELSE
19 FOR j IN 0 .. v_length_of_target LOOP
20 column_to_left(j) := j;
21 END LOOP;
22 FOR i IN 1.. v_length_of_source LOOP
23 current_column(0) := i;
24 FOR j IN 1 .. v_length_of_target LOOP
25 IF SUBSTR (p_source_string, i, 1) =
26 SUBSTR (p_target_string, j, 1)
27 THEN v_cost := 0;
28 ELSE v_cost := 1;
29 END IF;
30 current_column(j) := LEAST (current_column(j-1) + 1,
31 column_to_left(j) + 1,
32 column_to_left(j-1) + v_cost);
33 END LOOP;
34 FOR j IN 0 .. v_length_of_target LOOP
35 column_to_left(j) := current_column(j);
36 END LOOP;
37 END LOOP;
38 END IF;
39 RETURN current_column(v_length_of_target);
40 END ld;
41 /
Function created.
scott@ORA92> select ename, ld ('SMYTHE', ename) as lev_dist from emp order by lev_dist
2 /
ENAME LEV_DIST
---------- ----------
SMITH 2
SCOTT 4
MARTIN 5
BLAKE 5
MILLER 5
ALLEN 6
TURNER 6
JAMES 6
FORD 6
ADAMS 6
KING 6
CLARK 6
WARD 6
JONES 6
14 rows selected.
scott@ORA92> -- closest match:
scott@ORA92> select ename, lev_dist from
2 (select ename, ld ('SMYTHE', ename) as lev_dist from emp order by ld ('SMYTHE', ename))
3 where rownum = 1
4 /
ENAME LEV_DIST
---------- ----------
SMITH 2
|
|
|
Re: Functions that returns the difference between strings [message #14320 is a reply to message #14311] |
Fri, 20 August 2004 07:45   |
 |
Barbara Boehmer
Messages: 9106 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
The following takes it one step further and creates a user-defined aggregate function to return the closest match. Where there is a tie, it may return either value.
scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
2 (p_source_string IN VARCHAR2,
3 p_target_string IN VARCHAR2)
4 RETURN NUMBER
5 DETERMINISTIC
6 AS
7 v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
8 v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
9 TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
10 column_to_left mytabtype;
11 current_column mytabtype;
12 v_cost NUMBER := 0;
13 BEGIN
14 IF v_length_of_source = 0 THEN
15 RETURN v_length_of_target;
16 ELSIF v_length_of_target = 0 THEN
17 RETURN v_length_of_source;
18 ELSE
19 FOR j IN 0 .. v_length_of_target LOOP
20 column_to_left(j) := j;
21 END LOOP;
22 FOR i IN 1.. v_length_of_source LOOP
23 current_column(0) := i;
24 FOR j IN 1 .. v_length_of_target LOOP
25 IF SUBSTR (p_source_string, i, 1) =
26 SUBSTR (p_target_string, j, 1)
27 THEN v_cost := 0;
28 ELSE v_cost := 1;
29 END IF;
30 current_column(j) := LEAST (current_column(j-1) + 1,
31 column_to_left(j) + 1,
32 column_to_left(j-1) + v_cost);
33 END LOOP;
34 FOR j IN 0 .. v_length_of_target LOOP
35 column_to_left(j) := current_column(j);
36 END LOOP;
37 END LOOP;
38 END IF;
39 RETURN current_column(v_length_of_target);
40 END ld;
41 /
Function created.
scott@ORA92> create or replace type match_input as object
2 (str_value VARCHAR2(4000),
3 str_tocompare VARCHAR2(4000));
4 /
Type created.
scott@ORA92> show errors
No errors.
scott@ORA92> create or replace type match_T as object
2 (str_agg VARCHAR2(4000),
3 lev_dist number,
4 min_val number,
5 static function ODCIAggregateInitialize
6 (sctx IN OUT match_T)
7 return number,
8 member function ODCIAggregateIterate
9 (self IN OUT match_T,
10 value IN match_input)
11 return number,
12 member function ODCIAggregateTerminate
13 (self IN match_T,
14 returnValue OUT varchar2,
15 flags IN number)
16 return number,
17 member function ODCIAggregateMerge
18 (self IN OUT match_T,
19 ctx2 IN match_T)
20 return number);
21 /
Type created.
scott@ORA92> show errors
No errors.
scott@ORA92> create or replace type body match_T is
2 static function ODCIAggregateInitialize
3 (sctx IN OUT match_T)
4 return number
5 is
6 begin
7 sctx := match_T (null, null, null);
8 return ODCIConst.Success;
9 end;
10
11 member function ODCIAggregateIterate
12 (self IN OUT match_T,
13 value IN match_input)
14 return number
15 is
16 begin
17 if self.str_agg is null then
18 self.STR_AGG := value.str_value;
19 self.lev_dist := ld (value.STR_VALUE, value.str_tocompare);
20 self.min_val := ld (value.STR_VALUE, value.str_tocompare);
21 end if;
22
23 if ld (value.str_value, value.str_tocompare)
24 < self.min_val then
25 self.str_agg := value.str_value;
26 self.min_val := ld (value.STR_VALUE, value.str_tocompare);
27 end if;
28 return ODCIConst.Success;
29 end;
30
31 member function ODCIAggregateTerminate
32 (self IN match_T,
33 returnValue OUT varchar2,
34 flags IN number)
35 return number
36 is
37 begin
38 returnValue := STR_AGG;
39 return ODCIConst.Success;
40 end;
41
42 member function ODCIAggregateMerge
43 (self IN OUT match_T,
44 ctx2 IN match_T)
45 return number
46 is
47 c_self varchar2(1);
48 c_ctx2 varchar2(1);
49 c_ini varchar2(1);
50 i integer;
51 begin
52 if ctx2.lev_dist < self.lev_dist then
53 self.str_agg := ctx2.str_agg;
54 end if;
55 return ODCIConst.Success;
56 end;
57 end;
58 /
Type body created.
scott@ORA92> show errors
No errors.
scott@ORA92> create or replace function closest_match
2 (input match_input)
3 return varchar2
4 parallel_enable aggregate using match_t;
5 /
Function created.
scott@ORA92> show errors
No errors.
scott@ORA92> -- closest match:
scott@ORA92> select closest_match (match_input (ename, 'SMYTHE')) from emp
2 /
CLOSEST_MATCH(MATCH_INPUT(ENAME,'SMYTHE'))
----------------------------------------------------------------------------------------------------
SMITH
scott@ORA92> -- closest match in each department:
scott@ORA92> column name_to_compare format a30
scott@ORA92> column closest_match format a30
scott@ORA92> select deptno,
2 'SMYTHE' as name_to_compare,
3 closest_match (match_input (ename, 'SMYTHE')) as closest_match
4 from emp
5 group by deptno
6 /
DEPTNO NAME_TO_COMPARE CLOSEST_MATCH
---------- ------------------------------ ------------------------------
10 SMYTHE MILLER
20 SMYTHE SMITH
30 SMYTHE BLAKE
|
|
|
Re: Functions that returns the difference between strings [message #14321 is a reply to message #14320] |
Fri, 20 August 2004 08:17   |
 |
Barbara Boehmer
Messages: 9106 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
The user-defined aggregate function that I posted for finding the closest match does not return ties and may return either matching value. The following is another way to return the closest match within each department, including ties. You could also use this method to return a limited set of matches, limited by either the dense_rank value or lev_dist value. I am still just guessing at what your ultimate goal may be, so I hope some of this is useful to you.
scott@ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance
2 (p_source_string IN VARCHAR2,
3 p_target_string IN VARCHAR2)
4 RETURN NUMBER
5 DETERMINISTIC
6 AS
7 v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
8 v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
9 TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
10 column_to_left mytabtype;
11 current_column mytabtype;
12 v_cost NUMBER := 0;
13 BEGIN
14 IF v_length_of_source = 0 THEN
15 RETURN v_length_of_target;
16 ELSIF v_length_of_target = 0 THEN
17 RETURN v_length_of_source;
18 ELSE
19 FOR j IN 0 .. v_length_of_target LOOP
20 column_to_left(j) := j;
21 END LOOP;
22 FOR i IN 1.. v_length_of_source LOOP
23 current_column(0) := i;
24 FOR j IN 1 .. v_length_of_target LOOP
25 IF SUBSTR (p_source_string, i, 1) =
26 SUBSTR (p_target_string, j, 1)
27 THEN v_cost := 0;
28 ELSE v_cost := 1;
29 END IF;
30 current_column(j) := LEAST (current_column(j-1) + 1,
31 column_to_left(j) + 1,
32 column_to_left(j-1) + v_cost);
33 END LOOP;
34 FOR j IN 0 .. v_length_of_target LOOP
35 column_to_left(j) := current_column(j);
36 END LOOP;
37 END LOOP;
38 END IF;
39 RETURN current_column(v_length_of_target);
40 END ld;
41 /
Function created.
scott@ORA92> column name_to_compare format a30
scott@ORA92> column closest_match format a30
scott@ORA92> select deptno, 'SMYTHE' as name_to_compare, ename as closest_match
2 from (select deptno, ename,
3 dense_rank () over
4 (partition by deptno
5 order by lev_dist) as dr
6 from (select deptno, ename, ld (ename, 'SMYTHE') as lev_dist
7 from emp))
8 where dr = 1
9 /
DEPTNO NAME_TO_COMPARE CLOSEST_MATCH
---------- ------------------------------ ------------------------------
10 SMYTHE MILLER
20 SMYTHE SMITH
30 SMYTHE MARTIN
30 SMYTHE BLAKE
|
|
|
|
Re: Functions that returns the difference between strings [message #14328 is a reply to message #14322] |
Fri, 20 August 2004 22:28  |
 |
Barbara Boehmer
Messages: 9106 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Thanks Art. I just figured out that I can also use it like an analytic function too:
scott@ORA92> column closest_match_in_deptno format a23
scott@ORA92> select deptno,
2 'SMYTHE' as name_to_compare,
3 ename,
4 closest_match (match_input (ename, 'SMYTHE')) over
5 (partition by deptno) as closest_match_in_deptno
6 from emp
7 /
DEPTNO NAME_TO_COMPARE ENAME CLOSEST_MATCH_IN_DEPTNO
---------- ------------------------------ ---------- -----------------------
10 SMYTHE CLARK MILLER
10 SMYTHE KING MILLER
10 SMYTHE MILLER MILLER
20 SMYTHE SMITH SMITH
20 SMYTHE ADAMS SMITH
20 SMYTHE FORD SMITH
20 SMYTHE SCOTT SMITH
20 SMYTHE JONES SMITH
30 SMYTHE ALLEN BLAKE
30 SMYTHE BLAKE BLAKE
30 SMYTHE MARTIN BLAKE
30 SMYTHE JAMES BLAKE
30 SMYTHE TURNER BLAKE
30 SMYTHE WARD BLAKE
14 rows selected.
|
|
|
Goto Forum:
Current Time: Sat Sep 06 13:16:48 CDT 2025
|