Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: kroger <kroger_at_vp.pl>
Date: Fri, 15 Sep 2006 23:14:34 +0200
Message-ID: <eef53s$4a6$1@news.onet.pl>

> At least my example shows that this is not true. It only shows up with
> 2 lines of duplicates when both names are equal.

That is a good point though.

What finally solved the case for me, is (adding city as an additional column for example):

select min(t1.id) from test t1, test t2
where t2.name like '%' || t1.name || '%' and soundex(t2.city)=soundex(t1.city)
--and some other conditions
and t2.id != t1.id group by t1.name, soundex(t1.city)

This way I'm able to receive the ids of 'parents' for all other duplicates.

Referring to the table you gave as an example, this query returns rows 1, 5 and 8, and that I'm entirely happy with.

Now, having the ids of rows that are kind of 'origins' for other duplicates, I can get the others easily..

Thanks again to all of you for all this discussion.. It did really good to me :)

Another thing is that the query is quite slow when going through a table with 500k records..
But here I'm going to play with reverse index function as someone suggested in the other post..

Best regards,
Kroger Received on Fri Sep 15 2006 - 16:14:34 CDT