| Join help needed [message #578154] |
Mon, 25 February 2013 00:51  |
 |
srikanth_d5
Messages: 39 Registered: June 2012 Location: hyderabad
|
Member |
|
|
Hi experts
I'm using join condition on VARCHAR2 datatype in that there were typo mistakes with that join condition is not working how to overcome this problem
for example:-
Message=Meesage
Climate=Cllmate
Weather=Whether
Kindly help me
Thanks and Regards
|
|
|
|
|
|
|
|
| Re: Join help needed [message #578163 is a reply to message #578154] |
Mon, 25 February 2013 01:07   |
_jum
Messages: 453 Registered: February 2008
|
Senior Member |
|
|
This in general is a difficult problem. You may have a look at utl_match:
WITH cdata AS
(SELECT 'Message' str FROM dual UNION ALL
SELECT 'Climate' FROM dual UNION ALL
SELECT 'Weather' FROM dual),
edata AS
(SELECT 'Meesage' str FROM dual UNION ALL
SELECT 'Cllmate' FROM dual UNION ALL
SELECT 'Whether' FROM dual)
SELECT cdata.str, edata.str,
utl_match.jaro_winkler_similarity (cdata.str, edata.str) jwsm,
utl_match.edit_distance_similarity(cdata.str, edata.str) edsm
FROM cdata, edata;
cstr estr jwsm edsm
-------------------------------------
Message Meesage 92 86
Message Cllmate 52 29
Message Whether 52 0
Climate Meesage 52 29
Climate Cllmate 92 86
Climate Whether 52 0
Weather Meesage 61 15
Weather Cllmate 61 15
Weather Whether 91 72
|
|
|
|
| Re: Join help needed [message #578258 is a reply to message #578163] |
Tue, 26 February 2013 03:25   |
 |
Barbara Boehmer
Messages: 7669 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Utl_match is effective, but slow on large datasets. You can use Oracle text with FUZZY and the Oracle Text soundex operator "!" to search for similarly spelled and similar sounding matches using an index. You can use such a query that results in a smaller sub-set of your data as an inline view and use utl_match in an outer query to get more accurate comparison scores. This combination gets you both better search speed on large data sets and more accurate ordering. Jaro_winkler is generally better for names and edit distance is generally better for words. Please see the example below.
-- tables and data:
SCOTT@orcl_11gR2> CREATE TABLE cdata
2 (str VARCHAR2(15))
3 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO cdata VALUES ('Message')
3 INTO cdata VALUES ('Climate')
4 INTO cdata VALUES ('Weather')
5 SELECT * FROM DUAL
6 /
3 rows created.
SCOTT@orcl_11gR2> CREATE TABLE edata
2 (str VARCHAR2(15))
3 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO edata VALUES ('Meesage')
3 INTO edata VALUES ('Cllmate')
4 INTO edata VALUES ('Whether')
5 SELECT * FROM DUAL
6 /
3 rows created.
-- index:
SCOTT@orcl_11gR2> CREATE INDEX cdata_idx ON cdata (str)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 /
Index created.
-- query:
SCOTT@orcl_11gR2> SELECT cstr, estr, score,
2 utl_match.edit_distance_similarity (cstr, estr) edsm
3 FROM (SELECT cdata.str cstr, edata.str estr, SCORE(1) score
4 FROM cdata, edata
5 WHERE CONTAINS
6 (cdata.str,
7 'FUZZY (' || edata.str || ',1,5000,W) OR ' ||
8 '!' || edata.str, 1) > 0)
9 ORDER BY edsm DESC, score DESC
10 /
CSTR ESTR SCORE EDSM
--------------- --------------- ---------- ----------
Message Meesage 35 86
Climate Cllmate 31 86
Weather Whether 4 72
3 rows selected.
|
|
|
|
| Re: Join help needed [message #578259 is a reply to message #578258] |
Tue, 26 February 2013 03:39   |
 |
Barbara Boehmer
Messages: 7669 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
It looks like the optimizer is smart enough that you don't need to put utl_match in an outer query to get it to use the Oracle Text index to limit the rows first, as shown below.
SCOTT@orcl_11gR2> -- tables and data:
SCOTT@orcl_11gR2> CREATE TABLE cdata
2 (str VARCHAR2(15))
3 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO cdata VALUES ('Message')
3 INTO cdata VALUES ('Climate')
4 INTO cdata VALUES ('Weather')
5 SELECT * FROM DUAL
6 /
3 rows created.
SCOTT@orcl_11gR2> INSERT INTO cdata
2 SELECT DISTINCT SUBSTR (object_name, 1, 15)
3 FROM all_objects
4 /
33306 rows created.
SCOTT@orcl_11gR2> CREATE TABLE edata
2 (str VARCHAR2(15))
3 /
Table created.
SCOTT@orcl_11gR2> INSERT ALL
2 INTO edata VALUES ('Meesage')
3 INTO edata VALUES ('Cllmate')
4 INTO edata VALUES ('Whether')
5 SELECT * FROM DUAL
6 /
3 rows created.
SCOTT@orcl_11gR2> -- index:
SCOTT@orcl_11gR2> CREATE INDEX cdata_idx ON cdata (str)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 /
Index created.
SCOTT@orcl_11gR2> -- query:
SCOTT@orcl_11gR2> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl_11gR2> SELECT *
2 FROM (SELECT cdata.str cstr, edata.str estr, SCORE(1) score,
3 utl_match.edit_distance_similarity (cdata.str, edata.str) edsm
4 FROM cdata, edata
5 WHERE CONTAINS
6 (cdata.str,
7 'FUZZY (' || edata.str || ',1,5000,W) OR ' ||
8 '!' || edata.str, 1) > 0
9 ORDER BY edsm DESC, score DESC)
10 WHERE ROWNUM <= 3
11 /
CSTR ESTR SCORE EDSM
--------------- --------------- ---------- ----------
Climate Cllmate 81 86
Message Meesage 79 86
Weather Whether 15 72
3 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 111518446
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 132 | 29 (4)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 52 | 2288 | 29 (4)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY | | 52 | 1560 | 29 (4)| 00:00:01 |
| 4 | NESTED LOOPS | | 52 | 1560 | 28 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EDATA | 3 | 27 | 3 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| CDATA | 17 | 357 | 28 (0)| 00:00:01 |
|* 7 | DOMAIN INDEX | CDATA_IDX | | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=3)
7 - access("CTXSYS"."CONTAINS"("CDATA"."STR",'FUZZY
('||"EDATA"."STR"||',1,5000,W) OR '||'!'||"EDATA"."STR",1)>0)
Note
-----
- dynamic sampling used for this statement (level=2)
SCOTT@orcl_11gR2>
|
|
|
|
|
|