Re: Retrieval of duplicates, possible??
Date: 2000/05/25
Message-ID: <u7snv7yqsc.fsf_at_o2-3.ebi.ac.uk>#1/1
On Wed, 24 May 2000 18:26:48 GMT,
"Al" == nbirvine <nbirvine_at_my-deja.com> writes:
Al> Hello,
Al> I have two DEEP tables (>3 millions rows each) called POLICY and
Al> COMMISSION_POLICY and in order to fix a data problem, I need to
Al> identify all the duplicate policy numbers in both tables.
Al> By doing SELECT POLICY NUMBERS from POLICY UNION SELECT POLICY NUMBERS
Al> from COMMISSION_POLICY, I will create a 3rd table containing the result
Al> set common to both tables, thereby EXCLUDING duplicates.
Al> I want to do the opposite: the result of my query should contain Al> nothing but duplicate policy numbers found in both tables. Logically, Al> it seems that I would need to use a MINUS operator of some sort.
No: both MINUS and UNION are proper set operations, so they will exclude any duplicates. So { a, a, b, c } - { a, b, c} = { b, c}, which is not what you want.
Al> FYI, I am using MS SQLServer 7.0.
Let's hope it can do this:
SELECT *
FROM Policy
WHERE policy_number IN
(SELECT policy_number FROM Policy GROUP BY policy_number HAVING COUNT(*) > 1);
This selects all the duplicate rows of table Policy; similar trick goes for the other one. To get rid of duplicates, you'll have to fall back on vendor specific stuff; usally, rows have an 'invisible' row identifier that you can use to distinguish between the two. If two rows are really identical, following will do the trick:
DELETE From Policy as P
WHERE P.rowid > (SELECT MIN(Q.rowid)
FROM Policy as Q WHERE Q.policy_number = P.policy_number) -- AND further criteria.
'rowid' is the vendor(Oracle) specific 'invisible' row identifier. Don't forget to set up the appropriate primary key or uniqueness constraints when you're done.
Cheers,
Philip
-- /dev/brain: character special (53/0) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-24Received on Thu May 25 2000 - 00:00:00 CEST
+44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53