Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: joel garry <joel-garry_at_home.com>
Date: 15 Sep 2006 14:50:15 -0700
Message-ID: <1158357015.612523.208560@p79g2000cwp.googlegroups.com>

kroger wrote:
> >
> > At least my example shows that this is not true. It only shows up with
> > 2 lines of duplicates when both names are equal.
>
> That is a good point though.
>
> What finally solved the case for me, is (adding city as an additional column
> for example):
>
> select min(t1.id) from test t1, test t2
> where t2.name like '%' || t1.name || '%'
> and soundex(t2.city)=soundex(t1.city)
> --and some other conditions
> and t2.id != t1.id group by t1.name, soundex(t1.city)
>
> This way I'm able to receive the ids of 'parents' for all other duplicates.
>
> Referring to the table you gave as an example, this query returns rows 1, 5
> and 8, and that I'm entirely happy with.
>
> Now, having the ids of rows that are kind of 'origins' for other
> duplicates, I can get the others easily..
>
> Thanks again to all of you for all this discussion..
> It did really good to me :)
>
> Another thing is that the query is quite slow when going through a table
> with 500k records..
> But here I'm going to play with reverse index function as someone suggested
> in the other post..

Probably all those soundex conversions. Maybe a function based index on soundex(city)?

Thanks for this update, it looks like the key to this is to not only compare the name, but to compare another attribute to winnow the possible names to compare. Then make that other comparison fast. I'm glad I posted something about soundex.

jg

--
@home.com is bogus.
"Small things like making two fatal errors sometimes cancel each other
out and make for something that works!!!" - Tom Kyte
Received on Fri Sep 15 2006 - 16:50:15 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US