Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: Andy Hassall <andy_at_andyh.co.uk>
Date: Tue, 05 Sep 2006 21:33:33 +0100
Message-ID: <8knrf2h4hsb817gvu0v8k7obqd1e036bg5@4ax.com>

On Tue, 5 Sep 2006 21:05:15 +0200, "kroger" <kroger_at_vp.pl> wrote:

>I've been struggling with that for two days now...
>There is a simple solution for finding duplicates - with GROUP BY and HAVING
>COUNT(*)>1 but it is not enough in my case...
>
>For the example table as follows:
>
>id || name
>1 || aaa
>2 || aaa xxx
>3 || aaa
>4 || aaah
>5 || bbb
>6 || bbb p
>7 || ccc
>
>I would need to see the following:
>
>rowId || duplicatedRowId
>1 2
>1 3
>1 4
>5 6
>
>This means, that I need to build a query that for a given row selects
>others, whose beginning of the name is similar...
>
>In fact, something like that:
>
>select t1.id as rowId, t2.id as duplicateRowId from test t1, test t2 where
>t1.name like t2.name||'%' and t1.id<>t2.id
>
>would do... but in such case I end up with all id-pairs (meaning I get 5 is
>a duplicate of 6 and 6 is a duplicate of 5)

Then surely all you need to do is remove half of the condition; t1.id > t2.id (or reverse if so desired).

(p.s. rowid is not a good alias - clashes with a fairly important keyword)

-- 
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

Received on Tue Sep 05 2006 - 15:33:33 CDT