Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Fuzzy search

Re: Fuzzy search

From: makbo <makbo_at_pacbell.net>
Date: Thu, 22 Jan 2004 16:47:37 GMT
Message-ID: <J8TPb.14671$gX6.10305@newssvr27.news.prodigy.com>

Al Reid wrote:

> "Daniel Morgan" <damorgan_at_x.washington.edu> wrote in message news:1074784505.698129_at_yasure...
> 

>>Al Reid wrote:

[...]
>>>They want to be able to retrieve the record if the type in any of the following:
>>>
>>>A. B. Corp
>>>A.B. Corp
>>>AB Corp
>>>A.B Corp, etc.
>>>

[...]
>>
>>SELECT ...
>>FROM ...
>>WHERE some_column LIKE 'A%B%Corp%';
>>
> 
> 
> Would that not then return a record for Customer Name = 'Allen Bradley Corporation' in addition?  If so, it is not selective or
> fuzzy enough.
> 

[...]

The fuzziness is in what the user types as a selection criteria. I'm not sure how you could take the sample inputs and determine that they all need to be reduced to 'A%B%Corp%' in the WHERE clause--what if the user types "AB Corporation"? If you could do this, there wouldn't be a problem, you'd just map all possible user inputs to the actual customer name.

A common solution is to show the customer a pick list of possible matches based on the keywords they type. (But you've probably already thought of this...) You could also set up a way to "teach" your application to recognize fuzzy user criteria by capturing user entries over time in a table and mapping them to the customers. Or, just have all your users learn how to use wildcards in their queries (ha ha).

Another thought is to pre-create a unique "LIKE" matching string for every customer in a new column. Then you could query

SELECT...
FROM...
WHERE upper(:user_input) LIKE match_column

This could be a problem if you have many customers, each requiring manual creation of a suitable match string. Not sure about performance either...

Oracle Text product has some "fuzzy" matching features, from what I read.

Lastly, if you have access to Metalink, read notes 154880.1 and 119426.1 about how to add regular expression capability to Oracle via external procedures. Not a miracle solution, but a more powerful tool, anyway.

Excerpt:

"Regular expressions (henceforth noted re) are a powerful way for doing pattern matching against strings. One can think of them as wildcards on steroids. They allow to analyze strings or to represent entire families of strings through an intuitive and easy to remember notation.

"Oracle only supports both _ and % wildcards. While quite convenient, the range of what they can express is very restricted. Regular expressions take over where wildcards stop and bring pattern matching to a much higher level of power and flexibility."

--Mark Bole Received on Thu Jan 22 2004 - 10:47:37 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US