Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> finding duplicate records with typo's

finding duplicate records with typo's

From: tom <tomschuring_at_gmail.com>
Date: Sun, 05 Aug 2007 18:13:57 -0700
Message-ID: <1186362837.693799.303390@x35g2000prf.googlegroups.com>


hello,

can someone tell me (or point me in the right direction) of what the right way of finding duplicates in dirty data (caused by typo's) ?

is there something like a 'hashing' or 'rating' of text that will give you a number that you can compare ?

for example

hash( "hello") => 4323
hash( "helo") =>  4334
hash("tree")  => 7326

i'm not sure what direction i should look in, this is just an idea that i had, but any idea's are very welcome.

thanks,
tom Received on Sun Aug 05 2007 - 20:13:57 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US