Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Fuzzy search

Re: Fuzzy search

From: <ctcgag_at_hotmail.com>
Date: 22 Jan 2004 20:04:57 GMT
Message-ID: <20040122150457.752$Go@newsreader.com>


"Al Reid" <areidjr_at_nospamhotmail.com> wrote:
> > > > >
> > > > > They want to be able to retrieve the record if the type in any of
> > > > > the following:
> > > > >
> > > > > A. B. Corp
> > > > > A.B. Corp
> > > > > AB Corp
> > > > > A.B Corp, etc.
> >
> > uh, that's pretty fuzzy. What happened to the "C" in "A B C Corp"? Do
> > they want to also find "NBC", "ABC", and "CBS"? If they want "George
> > Washington" but they accidentally spell it "Thomas Jefferson", do they
> > want you magically correct that, also?
> >
>
> Sorry, my bad. I meant the entry in the database is 'A B Corp'
> I guess I was a little frustrated when I posted this.

Ah, that may be much less fuzzy, then. How about a fbi using a canonicalization function which removes all non-letter characters (and converts them all to upper while it is at it)? Of course, you'd still have to handle (or forbid) situations where the name (after transformation) is non-unique. Then all the above would simply become "ABCORP".

>
> > > > > I currently use SPs to retrieve the records from a VB program.
> > > > > Is there something I could add to the SP to provide this
> > > > > functionality without severely effecting performance?
> >
> > It's effect on performance would depend on how large the customer table
> > is. For some systems, doing FTS of the customer table 10 times a minute
> > would have no meaningful impact. For others, it would be fatal.
> > Strictly speaking, it may not have to do a FTS (for example, if you
> > always insist that at the list the first letter is not fuzzy), but I
> > think that's a good estimate to use for performance impact.
> >
>
> There are currently 626000 customers in the table.

If using a canonicalization function is good enough for them, then the size doesn't really matter (except to the extent that name collisions occur). If they want something fancier, like finding the minimum Levenshtein edit distance, then with a table that size you have your work cut out for you.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB
Received on Thu Jan 22 2004 - 14:04:57 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US