Re: Deleting all duplicate rows.

From: Martin Douglas <dafis_at_ppp.test.jccbi.gov>
Date: 1996/11/10
Message-ID: <NEWTNews.847658943.20561.dafis_at_ppp.test.jccbi.gov>#1/1


In article <5650dq$al3_at_uwm.edu>, <gidwani_at_alpha1.csd.uwm.edu> writes:
> Path: news.ou.edu!news.nodak.edu!netnews1.nwnet.net!netnews.nwnet.net!arclight.uoregon.edu!nntp.primenet.com!uwm.edu!alpha1.csd.uwm.edu!gidwani
> From: gidwani_at_alpha1.csd.uwm.edu (Vinay K Gidwani)
> Newsgroups: comp.databases.oracle
> Subject: Deleting all duplicate rows.
> Date: 10 Nov 1996 16:38:18 GMT
> Organization: University of Wisconsin - Milwaukee, Computing Services
 Division
> Lines: 32
> Distribution: world
> Message-ID: <5650dq$al3_at_uwm.edu>
> NNTP-Posting-Host: 129.89.169.1
> Keywords: Deleting all duplicate rows in a table
>
> I am trying to delete all duplicate rows in a table. The duplicates are to
> be determined on a few keys. The way I am thinking of implementing it
> now is
>
> Insert into temp_table as
> Select key1, key2, key3 from
> table
> group by key1, key2, key3
> having count(*) > 1
>
> Run a cursor through temp_table and delete all rows having
> keys key1, key2, key3 in table.
>
> This seems an overly complicated solution. Any suggestions on an
> easier way to do this.
>
> I have just started to work with Oracle and I would appreciate any
> suggestions on any good books. I am referring to George Koch
> Orcale THe Complete Reference which seems to be quite a
> comprehensive book.
>
> Thanks in Advance.
>
> Vinay.
>
>
>
>
>
>
> --
> email : gidwani_at_alpha1.csd.uwm.edu
>
You should probably create a UNIQUE Index over those key elements to prevent duplicated from being added to the system. But first, look up the topic "EXCEPTIONS" in you manuals and find the oracle script that build an exceptions table. I can't remember the script, probably something like PROCEXCP.SQL. After you create this table then...  

If you add the line EXCEPTIONS INTO EXCEPTIONS at the bottom of the index build statement, then ALL rowid's for duplicate rows are put into the exceptions table.

Because ALL rows that have duplicate records are listed by row_id in the exceptions table, You need to be carefull. You only want to delete one of the duplicates from your table. I built a package that read the EXCPETIONS table and the did the following:

CURSOR FIND_DUPS IS SELECT KEY1, KEY2, KEY3, KEY4 FROM table_name 
  WHERE ROW_ID IN (SELECT ROW_ID FROM EXCEPTIONS WHERE TABLE_NAME
   = 'table_name');

DUP_ROW = FIND_DUPS%ROW_TYPE; CURSOR DUP_ROW IS
  SELECT COUNT(*) FROM table_name A WHERE

  1. KEY1 = DUP_ROW.KEY1,
  2. KEY2 = DUP_ROW.KEY1,
  3. KEY3 = DUP_ROW.KEY1,
  4. KEY4 = DUP_ROW.KEY1; DUP_CNT = DUP_ROW%TYPE;
BEGIN
   OPEN FIND_DUPS;
   LOOP;
     FETCH FIND_DUPS INTO DUP_ROW;
     EXIT WHEN FIND_DUP%NOTFOUND;
     OPEN DUP_ROW;
     FETCH DUP_ROW INTO DUP_CNT;
       IF DUP_CNT > 1 THEN
         DELETE FROM table_name WHERE ROWID IN (
           SELECT ROW_ID FROM EXCEPTIONS
           WHERE TABLE_NAME = 'table_name');
       END IF;
     CLOSE DUP_ROW;

   END LOOP;
END procedure_name;

This is a delete by rowid, and will run fast.

This may seem more complicated, but this can be run anytime you think you may have duplicates. Just remember to truncate the exceptions table after you are finished.

Note the differences between ROWID AND ROW_ID. I can never remember which is the actual rowid system element associated with a given table and row. But if you describe the exceptions table, if the element is ROW_ID then my script above is correct. other wise just switch ROWID and ROW_ID in the above statement to detele records.

You can also 'select' the rows, 'select count(*)' the rows. Received on Sun Nov 10 1996 - 00:00:00 CET

Original text of this message