Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: How to do a "fuzzy" or approximate matching of strings in a SQL where clause

Re: How to do a "fuzzy" or approximate matching of strings in a SQL where clause

From: Barbara Boehmer <baboehme_at_hotmail.com>
Date: 8 Aug 2005 14:09:06 -0700
Message-ID: <1123532934.979790.255840@g49g2000cwa.googlegroups.com>


There are many options, most of which have already been mentioned by others, which can be used individually, or in combination. You can use built-in functions like LIKE and INSTR or you can use Oracle Text, which requires creation of an index and periodic synchronization of the index, but is an excellent combination of efficiency and flexibility, with various search options and scoring that can be used for ordering or further limiting. You can also use the Levenshtein distance (edit distance) formula, which can also be used for ordering and will find wrong spellings and things that the others do not, but tends to be slow on large datasets. The Levenshtein distance formula is best used for ordering and further limiting result sets after other methods have been applied. What combination you choose depends on your exact needs. I have provided a small sampling of some of these methods below for your perusal.

scott_at_ORA92> -- test data:
scott_at_ORA92> SELECT * FROM your_table
  2 /

YOUR_COLUMN



SQL server help request
Re: Re: SQL Server help request - followup FW: forwarded for your attention: SQL Server help request Help request SQL Server
Help requested regarding SQL Server
Requesting for some Help with SQL Server SQL Server Help Required
SQL Server Help needed
Some other request for help with something else

9 rows selected.

scott_at_ORA92>
scott_at_ORA92> -- like (built-in function):
scott_at_ORA92> SELECT * FROM your_table
  2  WHERE  UPPER (your_column) LIKE '%SQL SERVER%'
  3 AND UPPER (your_column) LIKE '%HELP%'   4 AND UPPER (your_column) LIKE '%REQUEST%'   5 /

YOUR_COLUMN



SQL server help request
Re: Re: SQL Server help request - followup FW: forwarded for your attention: SQL Server help request Help request SQL Server
Help requested regarding SQL Server
Requesting for some Help with SQL Server

6 rows selected.

scott_at_ORA92>
scott_at_ORA92> -- instr (built-in function):
scott_at_ORA92> SELECT * FROM your_table
  2  WHERE  INSTR (UPPER (your_column), 'SQL SERVER') > 0
  3 AND INSTR (UPPER (your_column), 'HELP') > 0   4 AND INSTR (UPPER (your_column), 'REQUEST') > 0   5 /

YOUR_COLUMN



SQL server help request
Re: Re: SQL Server help request - followup FW: forwarded for your attention: SQL Server help request Help request SQL Server
Help requested regarding SQL Server
Requesting for some Help with SQL Server

6 rows selected.

scott_at_ORA92>
scott_at_ORA92> -- context:
scott_at_ORA92> -- probably best combination of performance and
flexibility
scott_at_ORA92> -- the following only demonstrates a few options
scott_at_ORA92> -- requires creation of index:
scott_at_ORA92> CREATE INDEX your_index

  2 ON your_table (your_column)
  3 INDEXTYPE IS CTXSYS.CONTEXT
  4 /

Index created.

scott_at_ORA92> -- requires periodic synchronization of index
scott_at_ORA92> -- which can be scheduled to run automatically:
scott_at_ORA92> EXECUTE ctx_ddl.sync_index ('YOUR_INDEX')

PL/SQL procedure successfully completed.

scott_at_ORA92> -- set define off so ampersand can be used:
scott_at_ORA92> SET DEFINE OFF
scott_at_ORA92> SELECT your_column, SCORE (1) FROM your_table
  2 WHERE CONTAINS (your_column, 'SQL & SERVER & HELP & REQUEST', 1) > 0
  3 ORDER BY SCORE (1) DESC
  4 /
YOUR_COLUMN                                                    SCORE(1)

------------------------------------------------------------ ----------
Help request SQL Server 3 FW: forwarded for your attention: SQL Server help request 3 Re: Re: SQL Server help request - followup 3 SQL server help request 3

scott_at_ORA92> -- with fuzzy matching (old syntax, newer has more options):
scott_at_ORA92> SELECT your_column, SCORE (1) FROM your_table   2 WHERE CONTAINS (your_column, '?SQL & ?SERVER & ?HELP & ?REQUEST', 1) > 0
  3 ORDER BY SCORE (1) DESC
  4 /

YOUR_COLUMN                                                    SCORE(1)

------------------------------------------------------------ ----------
Help requested regarding SQL Server 3 Help request SQL Server 3 FW: forwarded for your attention: SQL Server help request 3 Re: Re: SQL Server help request - followup 3 SQL server help request 3 scott_at_ORA92>

scott_at_ORA92> -- Levenshtein distance:
scott_at_ORA92> -- (requires creation of user-defined function by just running the code below)
scott_at_ORA92> -- finds wrong spellings and things that others don't, but tends to be slow on large datasets
scott_at_ORA92> -- is best used in combination with other methods to order / further limit a small result set
scott_at_ORA92> CREATE OR REPLACE FUNCTION ld -- Levenshtein distance   2 (p_source_string IN VARCHAR2,
  3  	p_target_string   IN VARCHAR2)
  4    RETURN		     NUMBER

  5 DETERMINISTIC
  6 AS
  7 v_length_of_source NUMBER := NVL (LENGTH (p_source_string), 0);
  8 v_length_of_target NUMBER := NVL (LENGTH (p_target_string), 0);
  9    TYPE mytabtype IS     TABLE OF NUMBER INDEX BY BINARY_INTEGER;
 10    column_to_left	     mytabtype;
 11    current_column	     mytabtype;
 12    v_cost		     NUMBER := 0;

 13 BEGIN
 14 IF v_length_of_source = 0 THEN
 15 RETURN v_length_of_target;
 16 ELSIF v_length_of_target = 0 THEN  17 RETURN v_length_of_source;
 18 ELSE
 19  	 FOR j IN 0 .. v_length_of_target LOOP
 20  	   column_to_left(j) := j;
 21  	 END LOOP;
 22  	 FOR i IN 1.. v_length_of_source LOOP
 23  	   current_column(0) := i;
 24  	   FOR j IN 1 .. v_length_of_target LOOP
 25  	     IF SUBSTR (p_source_string, i, 1) =
 26  		SUBSTR (p_target_string, j, 1)
 27  	     THEN v_cost := 0;
 28  	     ELSE v_cost := 1;
 29  	     END IF;
 30  	     current_column(j) := LEAST (current_column(j-1) + 1,
 31  					 column_to_left(j) + 1,
 32  					 column_to_left(j-1) + v_cost);
 33  	   END LOOP;
 34  	   FOR j IN 0 .. v_length_of_target  LOOP
 35  	     column_to_left(j) := current_column(j);
 36  	   END LOOP;
 37  	 END LOOP;

 38 END IF;
 39 RETURN current_column(v_length_of_target);  40 END ld;
 41 /

Function created.

scott_at_ORA92> SELECT your_column,
  2 ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist   3 FROM your_table
  4 ORDER BY lev_dist
  5 /

YOUR_COLUMN                                                    LEV_DIST

------------------------------------------------------------ ----------
SQL server help request 0 SQL Server Help Required 4 SQL Server Help needed 5 Help request SQL Server 16 Re: Re: SQL Server help request - followup 21 Help requested regarding SQL Server 26 Requesting for some Help with SQL Server 29 Some other request for help with something else 32 FW: forwarded for your attention: SQL Server help request 36

9 rows selected.

scott_at_ORA92> -- limit by distance:
scott_at_ORA92> SELECT *
  2 FROM your_table
  3 WHERE ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') <= 5   4 /

YOUR_COLUMN



SQL server help request
SQL Server Help Required
SQL Server Help needed

scott_at_ORA92> -- limit by number of rows: scott_at_ORA92> SELECT your_column
  2 FROM (SELECT your_column,

  3  		    ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist
  4  	     FROM   your_table
  5  	     ORDER  BY lev_dist)

  6 WHERE ROWNUM <= 8
  7 /

YOUR_COLUMN



SQL server help request
SQL Server Help Required
SQL Server Help needed
Help request SQL Server
Re: Re: SQL Server help request - followup Help requested regarding SQL Server
Requesting for some Help with SQL Server Some other request for help with something else

8 rows selected.

scott_at_ORA92> -- limit by distance or rows: scott_at_ORA92> SELECT your_column
  2 FROM (SELECT your_column,

  3  		    ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist
  4  	     FROM   your_table
  5  	     ORDER  BY lev_dist)
  6  WHERE  ROWNUM <= 8
  7  OR     lev_dist <= 5

  8 /

YOUR_COLUMN



SQL server help request
SQL Server Help Required
SQL Server Help needed
Help request SQL Server
Re: Re: SQL Server help request - followup Help requested regarding SQL Server
Requesting for some Help with SQL Server Some other request for help with something else

8 rows selected.

scott_at_ORA92> -- limit by distance and rows: scott_at_ORA92> SELECT your_column
  2 FROM (SELECT your_column,

  3  		    ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist
  4  	     FROM   your_table
  5  	     WHERE  ld (UPPER (your_column), 'SQL SERVER HELP REQUEST')
<= 5
  6  	     ORDER  BY lev_dist)

  7 WHERE ROWNUM <= 8
  8 /

YOUR_COLUMN



SQL server help request
SQL Server Help Required
SQL Server Help needed
scott_at_ORA92>
scott_at_ORA92> -- example of combination of methods:
scott_at_ORA92> SELECT * FROM your_table

  2 WHERE UPPER (your_column) LIKE '%SQL SERVER%'   3 ORDER BY ld (UPPER (your_column), 'SQL SERVER HELP REQUEST')   4 /

YOUR_COLUMN



SQL server help request
SQL Server Help Required
SQL Server Help needed
Help request SQL Server
Re: Re: SQL Server help request - followup Help requested regarding SQL Server
Requesting for some Help with SQL Server FW: forwarded for your attention: SQL Server help request

8 rows selected. Received on Mon Aug 08 2005 - 16:09:06 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US