Home » SQL & PL/SQL » SQL & PL/SQL » Join help needed
Join help needed [message #578154] Mon, 25 February 2013 00:51 Go to next message
srikanth_d5
Messages: 39
Registered: June 2012
Location: hyderabad
Member

Hi experts

I'm using join condition on VARCHAR2 datatype in that there were typo mistakes with that join condition is not working how to overcome this problem

for example:-

Message=Meesage
Climate=Cllmate
Weather=Whether

Kindly help me

Thanks and Regards
Re: Join help needed [message #578160 is a reply to message #578154] Mon, 25 February 2013 01:04 Go to previous messageGo to next message
Littlefoot
Messages: 19648
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
Fix typos?

Try with SOUNDEX (although I'm not very optimistic about it).

I don't use Oracle Text, but someone (like Barbara) might suggest a really useful approach. Review some of her messages in our Text and interMedia forum, maybe you'll get the idea.
Re: Join help needed [message #578162 is a reply to message #578154] Mon, 25 February 2013 01:07 Go to previous messageGo to next message
Michel Cadot
Messages: 59279
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Search in Barbara Boehmer's post in this forum, she posted several on the subject and provided functions on fuzzy search.
You can also install Oracle Text/Context option and use the built-in functions (search for Barbara's posts in "Text & interMedia" forum).

Regards
Michel
Re: Join help needed [message #578163 is a reply to message #578154] Mon, 25 February 2013 01:07 Go to previous messageGo to next message
_jum
Messages: 490
Registered: February 2008
Senior Member
This in general is a difficult problem. You may have a look at utl_match:
WITH cdata AS
(SELECT 'Message' str FROM dual UNION ALL
 SELECT 'Climate'     FROM dual UNION ALL
 SELECT 'Weather'     FROM dual),
     edata AS 
(SELECT 'Meesage' str FROM dual UNION ALL
 SELECT 'Cllmate'     FROM dual UNION ALL
 SELECT 'Whether'     FROM dual)
SELECT cdata.str, edata.str, 
       utl_match.jaro_winkler_similarity (cdata.str, edata.str) jwsm,
       utl_match.edit_distance_similarity(cdata.str, edata.str) edsm
FROM cdata, edata;      

cstr       estr jwsm   edsm
-------------------------------------
Message	Meesage	92	86
Message	Cllmate	52	29
Message	Whether	52	 0
Climate	Meesage	52	29
Climate	Cllmate	92	86
Climate	Whether	52	 0
Weather	Meesage	61	15
Weather	Cllmate	61	15
Weather	Whether	91	72

Re: Join help needed [message #578258 is a reply to message #578163] Tue, 26 February 2013 03:25 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7990
Registered: November 2002
Location: California, USA
Senior Member
Utl_match is effective, but slow on large datasets. You can use Oracle text with FUZZY and the Oracle Text soundex operator "!" to search for similarly spelled and similar sounding matches using an index. You can use such a query that results in a smaller sub-set of your data as an inline view and use utl_match in an outer query to get more accurate comparison scores. This combination gets you both better search speed on large data sets and more accurate ordering. Jaro_winkler is generally better for names and edit distance is generally better for words. Please see the example below.

-- tables and data:
SCOTT@orcl_11gR2> CREATE TABLE cdata
  2    (str  VARCHAR2(15))
  3  /

Table created.

SCOTT@orcl_11gR2> INSERT ALL
  2  INTO cdata VALUES ('Message')
  3  INTO cdata VALUES ('Climate')
  4  INTO cdata VALUES ('Weather')
  5  SELECT * FROM DUAL
  6  /

3 rows created.

SCOTT@orcl_11gR2> CREATE TABLE edata
  2    (str  VARCHAR2(15))
  3  /

Table created.

SCOTT@orcl_11gR2> INSERT ALL
  2  INTO edata VALUES ('Meesage')
  3  INTO edata VALUES ('Cllmate')
  4  INTO edata VALUES ('Whether')
  5  SELECT * FROM DUAL
  6  /

3 rows created.


-- index:
SCOTT@orcl_11gR2> CREATE INDEX cdata_idx ON cdata (str)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  /

Index created.


-- query:
SCOTT@orcl_11gR2> SELECT cstr, estr, score,
  2  	    utl_match.edit_distance_similarity (cstr, estr) edsm
  3  FROM   (SELECT cdata.str cstr, edata.str estr, SCORE(1) score
  4  	     FROM   cdata, edata
  5  	     WHERE  CONTAINS
  6  		      (cdata.str,
  7  		       'FUZZY (' || edata.str || ',1,5000,W) OR ' ||
  8  		       '!' || edata.str, 1) > 0)
  9  ORDER  BY edsm DESC, score DESC
 10  /

CSTR            ESTR                 SCORE       EDSM
--------------- --------------- ---------- ----------
Message         Meesage                 35         86
Climate         Cllmate                 31         86
Weather         Whether                  4         72

3 rows selected.

Re: Join help needed [message #578259 is a reply to message #578258] Tue, 26 February 2013 03:39 Go to previous messageGo to next message
Barbara Boehmer
Messages: 7990
Registered: November 2002
Location: California, USA
Senior Member
It looks like the optimizer is smart enough that you don't need to put utl_match in an outer query to get it to use the Oracle Text index to limit the rows first, as shown below.

SCOTT@orcl_11gR2> -- tables and data:
SCOTT@orcl_11gR2> CREATE TABLE cdata
  2    (str  VARCHAR2(15))
  3  /

Table created.

SCOTT@orcl_11gR2> INSERT ALL
  2  INTO cdata VALUES ('Message')
  3  INTO cdata VALUES ('Climate')
  4  INTO cdata VALUES ('Weather')
  5  SELECT * FROM DUAL
  6  /

3 rows created.

SCOTT@orcl_11gR2> INSERT INTO cdata
  2  SELECT DISTINCT SUBSTR (object_name, 1, 15)
  3  FROM   all_objects
  4  /

33306 rows created.

SCOTT@orcl_11gR2> CREATE TABLE edata
  2    (str  VARCHAR2(15))
  3  /

Table created.

SCOTT@orcl_11gR2> INSERT ALL
  2  INTO edata VALUES ('Meesage')
  3  INTO edata VALUES ('Cllmate')
  4  INTO edata VALUES ('Whether')
  5  SELECT * FROM DUAL
  6  /

3 rows created.

SCOTT@orcl_11gR2> -- index:
SCOTT@orcl_11gR2> CREATE INDEX cdata_idx ON cdata (str)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  /

Index created.

SCOTT@orcl_11gR2> -- query:
SCOTT@orcl_11gR2> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl_11gR2> SELECT *
  2  FROM   (SELECT cdata.str cstr, edata.str estr, SCORE(1) score,
  3  		    utl_match.edit_distance_similarity (cdata.str, edata.str) edsm
  4  	     FROM   cdata, edata
  5  	     WHERE  CONTAINS
  6  		      (cdata.str,
  7  		       'FUZZY (' || edata.str || ',1,5000,W) OR ' ||
  8  		       '!' || edata.str, 1) > 0
  9  	     ORDER  BY edsm DESC, score DESC)
 10  WHERE  ROWNUM <= 3
 11  /

CSTR            ESTR                 SCORE       EDSM
--------------- --------------- ---------- ----------
Climate         Cllmate                 81         86
Message         Meesage                 79         86
Weather         Whether                 15         72

3 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 111518446

---------------------------------------------------------------------------------------------
| Id  | Operation                       | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |           |     3 |   132 |    29   (4)| 00:00:01 |
|*  1 |  COUNT STOPKEY                  |           |       |       |            |          |
|   2 |   VIEW                          |           |    52 |  2288 |    29   (4)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY        |           |    52 |  1560 |    29   (4)| 00:00:01 |
|   4 |     NESTED LOOPS                |           |    52 |  1560 |    28   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL          | EDATA     |     3 |    27 |     3   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID| CDATA     |    17 |   357 |    28   (0)| 00:00:01 |
|*  7 |       DOMAIN INDEX              | CDATA_IDX |       |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=3)
   3 - filter(ROWNUM<=3)
   7 - access("CTXSYS"."CONTAINS"("CDATA"."STR",'FUZZY
              ('||"EDATA"."STR"||',1,5000,W) OR '||'!'||"EDATA"."STR",1)>0)

Note
-----
   - dynamic sampling used for this statement (level=2)

SCOTT@orcl_11gR2>

Re: Join help needed [message #578428 is a reply to message #578259] Thu, 28 February 2013 00:09 Go to previous message
srikanth_d5
Messages: 39
Registered: June 2012
Location: hyderabad
Member
Thank u all

Previous Topic: ORA-00922: missing or invalid option
Next Topic: Conditional Selection of a Field
Goto Forum:
  


Current Time: Tue Sep 30 16:11:05 CDT 2014

Total time taken to generate the page: 0.11399 seconds