Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: Frank van Bortel <frank.van.bortel_at_gmail.com>
Date: Wed, 06 Sep 2006 09:44:54 +0200
Message-ID: <edltvu$2s3$1@news2.zwoll1.ov.home.nl>

joel garry schreef:

> kroger wrote:
>>> kroger wrote:

>>>> Hi,
>>>>
>>>> I've been struggling with that for two days now...
>>>> There is a simple solution for finding duplicates - with GROUP BY and
>>>> HAVING COUNT(*)>1 but it is not enough in my case...
>>>>
>>>> For the example table as follows:
>>>>
>>>> id || name
>>>> 1 || aaa
>>>> 2 || aaa xxx
>>>> 3 || aaa
>>>> 4 || aaah
>>>> 5 || bbb
>>>> 6 || bbb p
>>>> 7 || ccc

[snip]

> 
> Maybe I'm still not understanding, but might soundex help?
> http://www.orafaq.com/search/soundex
> 
> jg
> --

Thought crossed my mind, too, but I discarded it, as soundex only considers the first 4 similar characters, hence "aaa" and "aaa xxx" will be compared as "a" and "a x" and never be the same: Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production

SQL> select soundex('aaa'), soundex('aaa xxx') from dual;

SOUN SOUN
---- ----
A000 A200

As the OP wanted "aaa" and "aaa xxx" to be reported as possible duplicated, it's not going to help.

-- 
Regards,
Frank van Bortel

Top-posting is one way to shut me up...

Received on Wed Sep 06 2006 - 02:44:54 CDT