Re: Retrieval of duplicates, possible??

From: Philip Lijnzaad <lijnzaad_at_ebi.ac.uk>
Date: 2000/05/25
Message-ID: <u7snv7yqsc.fsf_at_o2-3.ebi.ac.uk>#1/1


On Wed, 24 May 2000 18:26:48 GMT,
"Al" == nbirvine <nbirvine_at_my-deja.com> writes:

Al> Hello,
Al> I have two DEEP tables (>3 millions rows each) called POLICY and
Al> COMMISSION_POLICY and in order to fix a data problem, I need to
Al> identify all the duplicate policy numbers in both tables.

Al> By doing SELECT POLICY NUMBERS from POLICY UNION SELECT POLICY NUMBERS
Al> from COMMISSION_POLICY, I will create a 3rd table containing the result Al> set common to both tables, thereby EXCLUDING duplicates.
Al> I want to do the opposite: the result of my query should contain
Al> nothing but duplicate policy numbers found in both tables. Logically,
Al> it seems that I would need to use a MINUS operator of some sort. 

No: both MINUS and UNION are proper set operations, so they will exclude any duplicates. So { a, a, b, c } - { a, b, c} = { b, c}, which is not what you want.

Al> FYI, I am using MS SQLServer 7.0.

Let's hope it can do this:

SELECT *
FROM Policy
WHERE policy_number IN

  (SELECT policy_number FROM Policy GROUP BY policy_number HAVING COUNT(*) > 1);  

This selects all the duplicate rows of table Policy; similar trick goes for the other one. To get rid of duplicates, you'll have to fall back on vendor specific stuff; usally, rows have an 'invisible' row identifier that you can use to distinguish between the two. If two rows are really identical, following will do the trick:

DELETE From Policy as P
WHERE P.rowid > (SELECT MIN(Q.rowid)

                 FROM Policy as Q
                 WHERE Q.policy_number = P.policy_number)
  -- AND further criteria.
                   

'rowid' is the vendor(Oracle) specific 'invisible' row identifier. Don't forget to set up the appropriate primary key or uniqueness constraints when you're done.

Cheers,

                                                                      Philip

-- 
/dev/brain:  character special (53/0)
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-24

+44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53
Received on Thu May 25 2000 - 00:00:00 CEST

Original text of this message