| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: finding duplicate records with typo's
tom wrote:
> hello,
>
> can someone tell me (or point me in the right direction) of what the
> right way of finding duplicates in dirty data (caused by typo's) ?
>
> is there something like a 'hashing' or 'rating' of text that will give
> you a number that you can compare ?
>
> for example
>
> hash( "hello") => 4323
> hash( "helo") => 4334
> hash("tree") => 7326
>
> i'm not sure what direction i should look in, this is just an idea
> that i had, but any idea's are very welcome.
>
> thanks,
> tom
>
If you are looking for duplicates, I assume you want to note the similarity between "hello" and "helo". The name of the function usually used for that is soundex. Received on Sun Aug 05 2007 - 20:25:00 CDT
![]() |
![]() |