Re: Need alternate way of eliminating duplicates from 93K table

From: Robert Walters <bobwal_at_infocom.com>
Date: 1996/04/08
Message-ID: <4kc15f$30j_at_mips.infocom.com>#1/1


jp_at_vllyoak.resun.com (Jeff Perry) wrote:
>
> Here's my situation; a user discovered their table
> with 93K rows has duplicated data, and they want
> the duplicates deleted.
>
> First, I attempted a correlated subquery as follows:
> delete from ciim86_oeorder_history A
> where rowid>
> (select min(rowid)
> from ciim86_oeorder_history
> where field1 = A.field1
> and field2 = A.field2
> );
> Thirty six (36) hours later, this query was still
> running when the system was rebooted.
>
> Next, wishing to avoid the correlated subquery method,
> I created a duplicate table with a unique constraint.
> Using PL/SQL, I built a loop to insert only
> unique records from the original table into this
> new temporary table, HOWEVER, I cannot figure out
> a way to resume the insert where it left off when
> it encounters its first duplicate record.
>
> If you know of some alternate method of removing these
> duplicate records, please share with me your ideas.
>
> TIA
>
>
>
>
> --
> Jeff Perry
> jp%vllyoak.resun.com%blkhole_at_sdd.hp.com

OTTOMH declare

  cursor dups is
  select key1, key2, key3, key4, . . . .   from table_name
  group by key1, key2, key3, key4, . . . .   having count(*) >= 2;

  save_doops dups%rowtype;

  any_rowid varchar2(100);

begin
  for doops in dups loop
    begin

      if save_doops is null or
         doops <> save_doops then
        save_doops := doops;
        select rowid
        into   any_rowid 
        from   table_name 
        where  doops.key1 = key1
        and    doops.key2 = key2
        and    doops.key3 = key3
        and    doops.key4 = key4 ....
      end if;
    exception
      when too_many_rows then
        delete from table_name where rowid = any_rowid;
        commit;

    end;
  end loop;
end;

Other than the inadvertant syntax and declaration errors, this should delete ONE of the duplicate rows in the table. If there are more than 2 duplicates, just run it again.

Of course, you can add goodies like reporting on what is getting nuked, changing the placement of the commit, and using all columns vs just key columns, but the general algorithm remains constant.

Bob Walters

Florists' Mutual Insurance Co                     bobwal_at_infocom.com

"Code typed in a browser should always be questioned!" Received on Mon Apr 08 1996 - 00:00:00 CEST

Original text of this message