| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> finding duplicate records with typo's
hello,
can someone tell me (or point me in the right direction) of what the right way of finding duplicates in dirty data (caused by typo's) ?
is there something like a 'hashing' or 'rating' of text that will give you a number that you can compare ?
for example
hash( "hello") => 4323
hash( "helo") => 4334
hash("tree") => 7326
i'm not sure what direction i should look in, this is just an idea that i had, but any idea's are very welcome.
thanks,
tom
Received on Sun Aug 05 2007 - 20:13:57 CDT
![]() |
![]() |