Home » SQL & PL/SQL » SQL & PL/SQL » tough requirement string comparison (oracle 10g)
tough requirement string comparison [message #312448] |
Tue, 08 April 2008 21:36  |
srinivas.k2005
Messages: 404 Registered: August 2006
|
Senior Member |
|
|
Hi,
I have requirement wherein i need to compare two strings (with mulitple words) and it should return the %(percentage) of comparison.
e.g. "oracle infotech" and "infotech oracle" are 100% match
Do we have any oracle builtins to compare ?
Please let me know if anyone has any idea on above requirement.
Thanks,
Srinivas
[Updated on: Tue, 08 April 2008 21:36] Report message to a moderator
|
|
|
Re: tough requirement string comparison [message #312460 is a reply to message #312448] |
Tue, 08 April 2008 23:44   |
 |
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Are you just going to be comparing two strings at a time or pairs of strings from columns in a table or compare one string to each value in a column or what? There are various comparison methods.
If your strings are stored in a table you can create Oracle text indexes on them and compare using contains clauses that returns a score.
There is a utl_match.edit_distance_similarity function that returns a percentage of similarity, but it may not be the formula that you are looking for and the order of words counts, so 'oracle infotech' and 'infotech oracle' would not be a 100% match.
How is your percentage defined? If you consider it as the number of matching words divided by the number of distinct words times 100, then you can create your own function something like the example below. It puts the words from each string into a collection then selects the intersection and the union of those two sets, in order to calculate the percentage.
SCOTT@orcl_11g> CREATE OR REPLACE TYPE varchar2_tt AS TABLE OF VARCHAR2 (2000);
2 /
Type created.
SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION percentage_of_matching_words
2 (p_string1 IN VARCHAR2,
3 p_string2 IN VARCHAR2)
4 RETURN NUMBER
5 AS
6 v_string1 VARCHAR2 (2000) := p_string1 || ' ';
7 v_string2 VARCHAR2 (2000) := p_string2 || ' ';
8 v_words1 varchar2_tt := varchar2_tt();
9 v_words2 varchar2_tt := varchar2_tt();
10 v_position NUMBER;
11 v_percent NUMBER;
12 BEGIN
13 LOOP
14 EXIT WHEN v_string1 IS NULL;
15 v_position := INSTR (v_string1, ' ' );
16 v_words1.EXTEND;
17 v_words1 (v_words1.COUNT) := LTRIM (RTRIM (SUBSTR (v_string1, 1, v_position - 1)));
18 v_string1 := SUBSTR (v_string1, v_position + 1);
19 END LOOP;
20 LOOP
21 EXIT WHEN v_string2 IS NULL;
22 v_position := INSTR (v_string2, ' ' );
23 v_words2.EXTEND;
24 v_words2 (v_words2.COUNT) := LTRIM (RTRIM (SUBSTR (v_string1, 1, v_position - 1)));
25 v_string2 := SUBSTR (v_string2, v_position + 1);
26 END LOOP;
27 WITH s1 AS (SELECT * FROM TABLE (CAST (in_list (p_string1) AS varchar2_tt))),
28 s2 AS (SELECT * FROM TABLE (CAST (in_list (p_string2) AS varchar2_tt)))
29 SELECT (i.cnt / u.cnt) * 100
30 INTO v_percent
31 FROM (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 INTERSECT SELECT * FROM s2)) i,
32 (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 UNION SELECT * FROM s2)) u;
33 RETURN v_percent;
34 END percentage_of_matching_words;
35 /
Function created.
SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> SELECT percentage_of_matching_words
2 ('oracle infotech',
3 'infotech oracle')
4 FROM DUAL
5 /
PERCENTAGE_OF_MATCHING_WORDS('ORACLEINFOTECH','INFOTECHORACLE')
---------------------------------------------------------------
100
SCOTT@orcl_11g> SELECT percentage_of_matching_words
2 ('Ship the documents to the Feds',
3 'Rip the documents to shreds')
4 FROM DUAL
5 /
PERCENTAGE_OF_MATCHING_WORDS('SHIPTHEDOCUMENTSTOTHEFEDS','RIPTHEDOCUMENTSTOSHRED
--------------------------------------------------------------------------------
42.8571429
SCOTT@orcl_11g>
|
|
|
Re: tough requirement string comparison [message #312494 is a reply to message #312460] |
Wed, 09 April 2008 01:03   |
 |
Michel Cadot
Messages: 68737 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
Same thing in SQL:
SQL> with
2 data as (
3 select 0 id, 'oracle infotech' val from dual
4 union all
5 select 1, 'infotech oracle' from dual
6 union all
7 select 2, 'Ship the documents to the Feds' from dual
8 union all
9 select 3, 'Rip the documents to shreds' from dual
10 ),
11 lines as (
12 select level line
13 from dual
14 connect by level <=
15 ( select max(length(val)-length(replace(val,' ','')))+1 from data )
16 ),
17 words as (
18 select id, val,
19 substr(val,
20 instr(' '||val||' ',' ',1,line),
21 instr(' '||val||' ',' ',1,line+1)-instr(' '||val||' ',' ',1,line)-1
22 ) word
23 from data, lines
24 where line <= length(val)-length(replace(val,' ',''))+1
25 ),
26 nb_words as (
27 select id, count(*) nb_words
28 from ( select trunc(id/2) id, word from words where mod(id,2) = 0
29 union
30 select trunc(id/2) id, word from words where mod(id,2) = 1 )
31 group by id
32 ),
33 common_words as (
34 select id, count(*) nb_common
35 from ( select trunc(id/2) id, word from words where mod(id,2) = 0
36 intersect
37 select trunc(id/2) id, word from words where mod(id,2) = 1 )
38 group by id
39 )
40 select d1.val val1, d2.val val2, 100*nb_common/nb_words pct_match
41 from nb_words, common_words, data d1, data d2
42 where common_words.id = nb_words.id
43 and trunc(d1.id/2) = nb_words.id
44 and mod(d1.id,2) = 0
45 and mod(d2.id,2) = 1
46 and trunc(d2.id/2) = nb_words.id
47 order by nb_words.id
48 /
VAL1 VAL2 PCT_MATCH
------------------------------ ------------------------------ ----------
oracle infotech infotech oracle 100
Ship the documents to the Feds Rip the documents to shreds 42.8571429
2 rows selected.
Regards
Michel
|
|
|
Re: tough requirement string comparison [message #313830 is a reply to message #312448] |
Mon, 14 April 2008 15:05   |
Frank_Zhou
Messages: 5 Registered: February 2008 Location: Braintree , MA
|
Junior Member |
|
|
An alternative SQL for comparing 2 input strings:
SQL> select max(old_str) str1, min(old_str) str2, count(distinct str)/max(str_cnt) *100 as mat_pct
2 FROM
3 (select old_str, str, count(distinct str) over ( ) str_cnt,
4 count(distinct id) over(partition by str) counter
5 from
6 (select id, old_str, doc.extract('/l/text()').getstringVal() str
7 from (select 1 as id, 'Ship the documents to the Feds' old_str from dual
8 union all
9 select 2, 'Rip the documents to shreds' from dual) t,
10 TABLE(xmlSequence(extract(XMLType('<doc><l>'||
11 replace(t.old_str,' ','</l><l>')||'</l></doc>'),'/doc/l'))) doc
12 )
13 )
14 where counter = 2;
STR1 STR2 MAT_PCT
------------------------------ ------------------------------ ----------
Ship the documents to the Feds Rip the documents to shreds 42.8571429
Frank
|
|
|
|
Re: tough requirement string comparison [message #501974 is a reply to message #312460] |
Fri, 01 April 2011 20:05   |
 |
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
I know this is an old thread, but I received an email pointing out an error and some missing code. On line 24 of the percentage_of_matching_words function, v_string1 should be v_string2. Also, I initially used Tom Kyte's in_list function for testing, then added code to the function to do the same thing, but forgot to substitute it for the in_list function. So, I have provided the corrected code and demo below.
SCOTT@orcl_11gR2> CREATE OR REPLACE TYPE varchar2_tt AS TABLE OF VARCHAR2 (2000);
2 /
Type created.
SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION percentage_of_matching_words
2 (p_string1 IN VARCHAR2,
3 p_string2 IN VARCHAR2)
4 RETURN NUMBER
5 AS
6 v_string1 VARCHAR2 (2000) := p_string1 || ' ';
7 v_string2 VARCHAR2 (2000) := p_string2 || ' ';
8 v_words1 varchar2_tt := varchar2_tt();
9 v_words2 varchar2_tt := varchar2_tt();
10 v_position NUMBER;
11 v_percent NUMBER;
12 BEGIN
13 LOOP
14 EXIT WHEN v_string1 IS NULL;
15 v_position := INSTR (v_string1, ' ' );
16 v_words1.EXTEND;
17 v_words1 (v_words1.COUNT) := LTRIM (RTRIM (SUBSTR (v_string1, 1, v_position - 1)));
18 v_string1 := SUBSTR (v_string1, v_position + 1);
19 END LOOP;
20 LOOP
21 EXIT WHEN v_string2 IS NULL;
22 v_position := INSTR (v_string2, ' ' );
23 v_words2.EXTEND;
24 v_words2 (v_words2.COUNT) := LTRIM (RTRIM (SUBSTR (v_string2, 1, v_position - 1)));
25 v_string2 := SUBSTR (v_string2, v_position + 1);
26 END LOOP;
27 WITH s1 AS (SELECT * FROM TABLE (CAST (v_words1 AS varchar2_tt))),
28 s2 AS (SELECT * FROM TABLE (CAST (v_words2 AS varchar2_tt)))
29 SELECT (i.cnt / u.cnt) * 100
30 INTO v_percent
31 FROM (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 INTERSECT SELECT * FROM s2)) i,
32 (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 UNION SELECT * FROM s2)) u;
33 RETURN v_percent;
34 END percentage_of_matching_words;
35 /
Function created.
SCOTT@orcl_11gR2> SELECT percentage_of_matching_words
2 ('oracle infotech',
3 'infotech oracle')
4 FROM DUAL
5 /
PERCENTAGE_OF_MATCHING_WORDS('ORACLEINFOTECH','INFOTECHORACLE')
---------------------------------------------------------------
100
1 row selected.
SCOTT@orcl_11gR2> SELECT percentage_of_matching_words
2 ('Ship the documents to the Feds',
3 'Rip the documents to shreds')
4 FROM DUAL
5 /
PERCENTAGE_OF_MATCHING_WORDS('SHIPTHEDOCUMENTSTOTHEFEDS','RIPTHEDOCUMENTSTOSHRED
--------------------------------------------------------------------------------
42.8571429
1 row selected.
SCOTT@orcl_11gR2>
|
|
|
Re: tough requirement string comparison [message #502307 is a reply to message #501974] |
Tue, 05 April 2011 13:57  |
 |
supercox
Messages: 4 Registered: April 2011 Location: NOVA
|
Junior Member |
|
|
Barbara,
Thank you for fixing this and posting it back here.
I have implemented it with my project and it is working wonderfully!
We needed to change the percentage from the long format provided ( 85.71428571428571428571428571428571428571 ) to ( 85.71 ).
This was accomplished by changing v_percent from NUMBER to DECIMAL(12,2) in line 11.
Thank you
Roger
|
|
|
Goto Forum:
Current Time: Sat Feb 15 23:29:33 CST 2025
|