Home » SQL & PL/SQL » SQL & PL/SQL » tough requirement string comparison (oracle 10g)
tough requirement string comparison [message #312448] Tue, 08 April 2008 21:36 Go to next message
srinivas.k2005
Messages: 159
Registered: August 2006
Senior Member
Hi,

I have requirement wherein i need to compare two strings (with mulitple words) and it should return the %(percentage) of comparison.
e.g. "oracle infotech" and "infotech oracle" are 100% match

Do we have any oracle builtins to compare ?

Please let me know if anyone has any idea on above requirement.

Thanks,
Srinivas

[Updated on: Tue, 08 April 2008 21:36]

Re: tough requirement string comparison [message #312460 is a reply to message #312448 ] Tue, 08 April 2008 23:44 Go to previous messageGo to next message
Barbara Boehmer
Messages: 3940
Registered: November 2002
Location: California, USA
Senior Member
Are you just going to be comparing two strings at a time or pairs of strings from columns in a table or compare one string to each value in a column or what? There are various comparison methods.

If your strings are stored in a table you can create Oracle text indexes on them and compare using contains clauses that returns a score.

There is a utl_match.edit_distance_similarity function that returns a percentage of similarity, but it may not be the formula that you are looking for and the order of words counts, so 'oracle infotech' and 'infotech oracle' would not be a 100% match.

How is your percentage defined? If you consider it as the number of matching words divided by the number of distinct words times 100, then you can create your own function something like the example below. It puts the words from each string into a collection then selects the intersection and the union of those two sets, in order to calculate the percentage.

SCOTT@orcl_11g> CREATE OR REPLACE TYPE varchar2_tt AS TABLE OF VARCHAR2 (2000);
  2  /

Type created.

SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION percentage_of_matching_words
  2    (p_string1   IN VARCHAR2,
  3  	p_string2   IN VARCHAR2)
  4    RETURN NUMBER
  5  AS
  6    v_string1	VARCHAR2 (2000) := p_string1 || ' ';
  7    v_string2	VARCHAR2 (2000) := p_string2 || ' ';
  8    v_words1 	varchar2_tt := varchar2_tt();
  9    v_words2 	varchar2_tt := varchar2_tt();
 10    v_position	NUMBER;
 11    v_percent	NUMBER;
 12  BEGIN
 13    LOOP
 14  	 EXIT WHEN v_string1 IS NULL;
 15  	 v_position := INSTR (v_string1, ' ' );
 16  	 v_words1.EXTEND;
 17  	 v_words1 (v_words1.COUNT) := LTRIM (RTRIM (SUBSTR (v_string1, 1, v_position - 1)));
 18  	 v_string1 := SUBSTR (v_string1, v_position + 1);
 19    END LOOP;
 20    LOOP
 21  	 EXIT WHEN v_string2 IS NULL;
 22  	 v_position := INSTR (v_string2, ' ' );
 23  	 v_words2.EXTEND;
 24  	 v_words2 (v_words2.COUNT) := LTRIM (RTRIM (SUBSTR (v_string1, 1, v_position - 1)));
 25  	 v_string2 := SUBSTR (v_string2, v_position + 1);
 26    END LOOP;
 27    WITH s1 AS (SELECT * FROM TABLE (CAST (in_list (p_string1) AS varchar2_tt))),
 28  	    s2 AS (SELECT * FROM TABLE (CAST (in_list (p_string2) AS varchar2_tt)))
 29    SELECT (i.cnt / u.cnt) * 100
 30    INTO   v_percent
 31    FROM   (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 INTERSECT SELECT * FROM s2)) i,
 32  	      (SELECT COUNT (*) cnt FROM (SELECT * FROM s1 UNION SELECT * FROM s2)) u;
 33    RETURN v_percent;
 34  END percentage_of_matching_words;
 35  /

Function created.

SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> SELECT percentage_of_matching_words
  2  	      ('oracle infotech',
  3  	       'infotech oracle')
  4  FROM DUAL
  5  /

PERCENTAGE_OF_MATCHING_WORDS('ORACLEINFOTECH','INFOTECHORACLE')
---------------------------------------------------------------
                                                            100

SCOTT@orcl_11g> SELECT percentage_of_matching_words
  2  	      ('Ship the documents to the Feds',
  3  	       'Rip the documents to shreds')
  4  FROM   DUAL
  5  /

PERCENTAGE_OF_MATCHING_WORDS('SHIPTHEDOCUMENTSTOTHEFEDS','RIPTHEDOCUMENTSTOSHRED
--------------------------------------------------------------------------------
                                                                      42.8571429

SCOTT@orcl_11g> 


Re: tough requirement string comparison [message #312494 is a reply to message #312460 ] Wed, 09 April 2008 01:03 Go to previous messageGo to next message
Michel Cadot
Messages: 15244
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Same thing in SQL:
SQL> with
  2    data as (
  3      select 0 id, 'oracle infotech' val from dual
  4      union all
  5      select 1, 'infotech oracle' from dual
  6      union all
  7      select 2, 'Ship the documents to the Feds' from dual
  8      union all
  9      select 3, 'Rip the documents to shreds' from dual
 10    ),
 11    lines as (
 12      select level line 
 13      from dual 
 14      connect by level <= 
 15        ( select max(length(val)-length(replace(val,' ','')))+1 from data )
 16    ),
 17    words as (
 18      select id, val,
 19             substr(val,
 20                    instr(' '||val||' ',' ',1,line),
 21                    instr(' '||val||' ',' ',1,line+1)-instr(' '||val||' ',' ',1,line)-1
 22                   ) word
 23      from data, lines
 24      where line <= length(val)-length(replace(val,' ',''))+1
 25    ),
 26    nb_words as (
 27      select id, count(*) nb_words
 28      from ( select trunc(id/2) id, word from words where mod(id,2) = 0
 29             union
 30             select trunc(id/2) id, word from words where mod(id,2) = 1 )
 31      group by id
 32    ),
 33    common_words as (
 34      select id, count(*) nb_common
 35      from ( select trunc(id/2) id, word from words where mod(id,2) = 0
 36             intersect
 37             select trunc(id/2) id, word from words where mod(id,2) = 1 )
 38      group by id
 39    )
 40  select d1.val val1, d2.val val2, 100*nb_common/nb_words pct_match
 41  from nb_words, common_words, data d1, data d2
 42  where common_words.id = nb_words.id
 43    and trunc(d1.id/2) = nb_words.id
 44    and mod(d1.id,2) = 0
 45    and mod(d2.id,2) = 1
 46    and trunc(d2.id/2) = nb_words.id
 47  order by nb_words.id
 48  /
VAL1                           VAL2                            PCT_MATCH
------------------------------ ------------------------------ ----------
oracle infotech                infotech oracle                       100
Ship the documents to the Feds Rip the documents to shreds    42.8571429

2 rows selected.

Regards
Michel
Re: tough requirement string comparison [message #313830 is a reply to message #312448 ] Mon, 14 April 2008 15:05 Go to previous messageGo to next message
Frank_Zhou
Messages: 5
Registered: February 2008
Location: Braintree , MA
Junior Member
An alternative SQL for comparing 2 input strings:

SQL> select max(old_str) str1, min(old_str) str2, count(distinct str)/max(str_cnt) *100 as mat_pct
  2  FROM
  3  (select old_str, str, count(distinct str) over ( ) str_cnt,
  4   count(distinct id) over(partition by str) counter
  5  from 
  6   (select id, old_str, doc.extract('/l/text()').getstringVal()  str  
  7    from (select 1 as id, 'Ship the documents to the Feds' old_str from dual
  8          union all
  9          select 2, 'Rip the documents to shreds' from dual) t,
 10     TABLE(xmlSequence(extract(XMLType('<doc><l>'||
 11          replace(t.old_str,' ','</l><l>')||'</l></doc>'),'/doc/l'))) doc
 12   )
 13  )
 14  where counter = 2;

STR1                             STR2                              MAT_PCT
------------------------------   ------------------------------ ----------
Ship the documents to the Feds   Rip the documents to shreds    42.8571429


Frank
Re: tough requirement string comparison [message #313949 is a reply to message #312448 ] Tue, 15 April 2008 05:30 Go to previous message
srinivas.k2005
Messages: 159
Registered: August 2006
Senior Member
Thanks For all your reply..
Previous Topic:Update Showing different results??
Next Topic:RAW datatype to sysdate (merged)
Goto Forum:
  


Current Time: Sat May 17 05:16:26 CDT 2008

Total time taken to generate the page: 0.05625 seconds
.:: Forum Home :: Site Home :: Wiki Home :: Contact :: Privacy ::.