Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: kroger <kroger_at_vp.pl>
Date: Sat, 16 Sep 2006 10:03:18 +0200
Message-ID: <eegb48$b0a$1@news.onet.pl>

> Probably all those soundex conversions. Maybe a function based index
> on soundex(city)?

Soundex doesn't have that big impact here. The heaviest part is
where t2.name like '%' || t1.name || '%'

Replacing it with
where t2.name = t1.name

reduced the execution time from several minutes to 10 seconds...

It's clear some reverse index is neccessary here.

> Thanks for this update, it looks like the key to this is to not only
> compare the name, but to compare another attribute to winnow the
> possible names to compare. Then make that other comparison fast. I'm
> glad I posted something about soundex.

The other attributes may help, but still, the biggest thing was querying for duplicate names..
Anyway, looks quite simple now :)

And thanks for the soundex, that will be useful too!

Kroger Received on Sat Sep 16 2006 - 03:03:18 CDT