Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Fuzzy search

Re: Fuzzy search

From: Jim Kennedy <kennedy-downwithspammersfamily_at_attbi.net>
Date: Sat, 24 Jan 2004 16:34:38 GMT
Message-ID: <y8xQb.109445$Rc4.772411@attbi_s54>


What about Soundex? It would ignore the .'s. Worth a shot. Jim
"Al Reid" <areidjr_at_reidHyphenhome.com> wrote in message news:F%WPb.16288$LM4.10392_at_nwrdny03.gnilink.net...
> <ctcgag_at_hotmail.com> wrote in message
> news:20040122150457.752$Go_at_newsreader.com...
> > "Al Reid" <areidjr_at_nospamhotmail.com> wrote:
> > > > > > >
> > > > > > > They want to be able to retrieve the record if the type in any
> of
> > > > > > > the following:
> > > > > > >
> > > > > > > A. B. Corp
> > > > > > > A.B. Corp
> > > > > > > AB Corp
> > > > > > > A.B Corp, etc.
> > > >
> > > > uh, that's pretty fuzzy. What happened to the "C" in "A B C Corp"?
> Do
> > > > they want to also find "NBC", "ABC", and "CBS"? If they want
"George
> > > > Washington" but they accidentally spell it "Thomas Jefferson", do
they
> > > > want you magically correct that, also?
> > > >
> > >
> > > Sorry, my bad. I meant the entry in the database is 'A B Corp'
> > > I guess I was a little frustrated when I posted this.
> >
> > Ah, that may be much less fuzzy, then. How about a fbi using a
> > canonicalization function which removes all non-letter characters (and
> > converts them all to upper while it is at it)? Of course, you'd still
> have
> > to handle (or forbid) situations where the name (after transformation)
is
> > non-unique. Then all the above would simply become "ABCORP".
> >
> > >
> > > > > > > I currently use SPs to retrieve the records from a VB program.
> > > > > > > Is there something I could add to the SP to provide this
> > > > > > > functionality without severely effecting performance?
> > > >
> > > > It's effect on performance would depend on how large the customer
> table
> > > > is. For some systems, doing FTS of the customer table 10 times a
> minute
> > > > would have no meaningful impact. For others, it would be fatal.
> > > > Strictly speaking, it may not have to do a FTS (for example, if you
> > > > always insist that at the list the first letter is not fuzzy), but I
> > > > think that's a good estimate to use for performance impact.
> > > >
> > >
> > > There are currently 626000 customers in the table.
> >
> > If using a canonicalization function is good enough for them,
> > then the size doesn't really matter (except to the extent that name
> > collisions occur). If they want something fancier, like finding the
> > minimum Levenshtein edit distance, then with a table that size you have
> > your work cut out for you.
> >
> >

>
> Thanks, I will explore the canonicalization function and see where it
leads.

>

> >
> > Xho
> >
> > --
> > -------------------- http://NewsReader.Com/ --------------------
> > Usenet Newsgroup Service New Rate! $9.95/Month 50GB
> >
Received on Sat Jan 24 2004 - 10:34:38 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US