Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: DA Morgan <damorgan_at_psoug.org>
Date: Tue, 05 Sep 2006 12:48:16 -0700
Message-ID: <1157485696.641344@bubbleator.drizzle.com>


kroger wrote:
> Hi,
>
> I've been struggling with that for two days now...
> There is a simple solution for finding duplicates - with GROUP BY and HAVING
> COUNT(*)>1 but it is not enough in my case...
>
> For the example table as follows:
>
> id || name
> 1 || aaa
> 2 || aaa xxx
> 3 || aaa
> 4 || aaah
> 5 || bbb
> 6 || bbb p
> 7 || ccc
>
> I would need to see the following:
>
> rowId || duplicatedRowId
> 1 2
> 1 3
> 1 4
> 5 6
>
> This means, that I need to build a query that for a given row selects
> others, whose beginning of the name is similar...
>
> In fact, something like that:
>
> select t1.id as rowId, t2.id as duplicateRowId from test t1, test t2 where
> t1.name like t2.name||'%' and t1.id<>t2.id
>
> would do... but in such case I end up with all id-pairs (meaning I get 5 is
> a duplicate of 6 and 6 is a duplicate of 5)
>
> I'm neither Oracle or SQL expert... Is there any easy way for obtaining what
> I desired? I feel I'm getting dumb with that..
> Or do I need to go into cursors etc?
> I'm also looking for efficient solution for that issue, since the source
> table to lookup is pretty big... 1-2M records...
>
> Thanks in advance,
> Best regards,
>
> Kroger

If you think ID1 = ID2 and ID1=ID4 you are not looking for duplicates. It is no wonder no one's proposed solution worked.

My immediate thought is that you have two choices. One is regular expressions if you are in 10g or a PL/SQL procedure.

Without knowing the business rules it is impossible to tell what would be required.

But by any normal definition the normal the only duplicates are ID1 and ID3 and I'd solve that problem by creating a unique constraint and dumping violations to an error table.

-- 
Puget Sound Oracle Users Group
Received on Tue Sep 05 2006 - 14:48:16 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US