Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: How to do a "fuzzy" or approximate matching of strings in a SQL where clause
There are many options, most of which have already been mentioned by
others, which can be used individually, or in combination. You can use
built-in functions like LIKE and INSTR or you can use Oracle Text,
which requires creation of an index and periodic synchronization of the
index, but is an excellent combination of efficiency and flexibility,
with various search options and scoring that can be used for ordering
or further limiting. You can also use the Levenshtein distance (edit
distance) formula, which can also be used for ordering and will find
wrong spellings and things that the others do not, but tends to be slow
on large datasets. The Levenshtein distance formula is best used for
ordering and further limiting result sets after other methods have been
applied. What combination you choose depends on your exact needs. I
have provided a small sampling of some of these methods below for your
perusal.
scott_at_ORA92> -- test data:
scott_at_ORA92> SELECT * FROM your_table
2 /
YOUR_COLUMN
9 rows selected.
scott_at_ORA92> scott_at_ORA92> -- like (built-in function): scott_at_ORA92> SELECT * FROM your_table 2 WHERE UPPER (your_column) LIKE '%SQL SERVER%'3 AND UPPER (your_column) LIKE '%HELP%' 4 AND UPPER (your_column) LIKE '%REQUEST%' 5 /
YOUR_COLUMN
6 rows selected.
scott_at_ORA92> scott_at_ORA92> -- instr (built-in function): scott_at_ORA92> SELECT * FROM your_table 2 WHERE INSTR (UPPER (your_column), 'SQL SERVER') > 03 AND INSTR (UPPER (your_column), 'HELP') > 0 4 AND INSTR (UPPER (your_column), 'REQUEST') > 0 5 /
YOUR_COLUMN
6 rows selected.
scott_at_ORA92> scott_at_ORA92> -- context: scott_at_ORA92> -- probably best combination of performance andflexibility
scott_at_ORA92> -- the following only demonstrates a few options scott_at_ORA92> -- requires creation of index: scott_at_ORA92> CREATE INDEX your_index
Index created.
scott_at_ORA92> -- requires periodic synchronization of index scott_at_ORA92> -- which can be scheduled to run automatically: scott_at_ORA92> EXECUTE ctx_ddl.sync_index ('YOUR_INDEX')
PL/SQL procedure successfully completed.
scott_at_ORA92> -- set define off so ampersand can be used: scott_at_ORA92> SET DEFINE OFF scott_at_ORA92> SELECT your_column, SCORE (1) FROM your_table2 WHERE CONTAINS (your_column, 'SQL & SERVER & HELP & REQUEST', 1) > 0
YOUR_COLUMN SCORE(1)
------------------------------------------------------------ ----------
Help request SQL Server 3 FW: forwarded for your attention: SQL Server help request 3 Re: Re: SQL Server help request - followup 3 SQL server help request 3
scott_at_ORA92> -- with fuzzy matching (old syntax, newer has more
options):
scott_at_ORA92> SELECT your_column, SCORE (1) FROM your_table
2 WHERE CONTAINS (your_column, '?SQL & ?SERVER & ?HELP & ?REQUEST',
1) > 0
3 ORDER BY SCORE (1) DESC
4 /
YOUR_COLUMN SCORE(1)
------------------------------------------------------------ ----------
Help requested regarding SQL Server 3 Help request SQL Server 3 FW: forwarded for your attention: SQL Server help request 3 Re: Re: SQL Server help request - followup 3 SQL server help request 3 scott_at_ORA92>
3 p_target_string IN VARCHAR2) 4 RETURN NUMBER
9 TYPE mytabtype IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; 10 column_to_left mytabtype; 11 current_column mytabtype; 12 v_cost NUMBER := 0;
19 FOR j IN 0 .. v_length_of_target LOOP 20 column_to_left(j) := j; 21 END LOOP; 22 FOR i IN 1.. v_length_of_source LOOP 23 current_column(0) := i; 24 FOR j IN 1 .. v_length_of_target LOOP 25 IF SUBSTR (p_source_string, i, 1) = 26 SUBSTR (p_target_string, j, 1) 27 THEN v_cost := 0; 28 ELSE v_cost := 1; 29 END IF; 30 current_column(j) := LEAST (current_column(j-1) + 1, 31 column_to_left(j) + 1, 32 column_to_left(j-1) + v_cost); 33 END LOOP; 34 FOR j IN 0 .. v_length_of_target LOOP 35 column_to_left(j) := current_column(j); 36 END LOOP; 37 END LOOP;
Function created.
scott_at_ORA92> SELECT your_column,
2 ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist
3 FROM your_table
4 ORDER BY lev_dist
5 /
YOUR_COLUMN LEV_DIST
------------------------------------------------------------ ----------
SQL server help request 0 SQL Server Help Required 4 SQL Server Help needed 5 Help request SQL Server 16 Re: Re: SQL Server help request - followup 21 Help requested regarding SQL Server 26 Requesting for some Help with SQL Server 29 Some other request for help with something else 32 FW: forwarded for your attention: SQL Server help request 36
9 rows selected.
scott_at_ORA92> -- limit by distance:
scott_at_ORA92> SELECT *
2 FROM your_table
3 WHERE ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') <= 5
4 /
YOUR_COLUMN
scott_at_ORA92> -- limit by number of rows:
scott_at_ORA92> SELECT your_column
2 FROM (SELECT your_column,
3 ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist 4 FROM your_table 5 ORDER BY lev_dist)
YOUR_COLUMN
8 rows selected.
scott_at_ORA92> -- limit by distance or rows:
scott_at_ORA92> SELECT your_column
2 FROM (SELECT your_column,
3 ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist 4 FROM your_table 5 ORDER BY lev_dist) 6 WHERE ROWNUM <= 8 7 OR lev_dist <= 5
YOUR_COLUMN
8 rows selected.
scott_at_ORA92> -- limit by distance and rows:
scott_at_ORA92> SELECT your_column
2 FROM (SELECT your_column,
3 ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') lev_dist 4 FROM your_table 5 WHERE ld (UPPER (your_column), 'SQL SERVER HELP REQUEST') <= 5 6 ORDER BY lev_dist)
YOUR_COLUMN
scott_at_ORA92> scott_at_ORA92> -- example of combination of methods: scott_at_ORA92> SELECT * FROM your_table
YOUR_COLUMN
8 rows selected. Received on Mon Aug 08 2005 - 15:55:59 CDT