Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: De-duplication of large table

Re: De-duplication of large table

From: Suresh Bhat <suresh.bhat_at_mitchell-energy.com>
Date: Thu, 18 Nov 1999 22:29:30 GMT
Message-ID: <01bf3224$be658940$a504fa80@mndnet>


Hi,

I had a similar problem with 11 million records with about 4 million duplicates/ triplicates and so on.

I wrote this PL/SQL to take care of it month by month, you will have to do it by range of recnum like
lets say increment by 50000 and you will have to use the loop 1000 times rather than 80.

Later !!!

Suresh Bhat
Oracleguru
www.oracleguru.net

create or replace procedure delete_duplicates

    is
BEGIN
/*

   STEPS:

     --
     1. Insert DISTINCT rows into TABLE_NAME_TEMP from TABLE_NAME,
        which this procedure accomplishes.
     --
     2. Then in SQL*Plus either RENAME TABLE_NAME to TABLE_NAME_TEMP
        or DROP it.
     --
        Rename TABLE_NAME_TEMP to TABLE_NAME and create a unique index
        on TAG_DATE, TAG_NAME to avoid the duplicate insertion in future.
*/
DECLARE
   min_tag_date date;
--

   cursor curs_min_tag_date

       is
   select min(tag_date) <===== this will be min(recno)      from table_name;
--

BEGIN
/*

   Insert DISTINCT records from table TABLE_NAME into    TABLE_NAME_TEMP.
*/

   BEGIN

      open    curs_min_tag_date;
      fetch   curs_min_tag_date
       into   min_tag_date;                   <======min_tag_date will be
your min_recnum
      close   curs_min_tag_date;

--
dbms_output.put_line ('Min TAG_DATE from table TABLE_NAME: ' ||to_char(min_tag_date,'dd-MON-yy hh24:mi:ss') );
--
if min_tag_date is not null then /* This will take care of duplicates for up to 80 days. */ for j in 1..80 loop <========= Change this to 1000 but test it with say 10 or 100 insert into table_name_temp select distinct * from table_name where tag_date between trunc(min_tag_date ) <===This will become between min_recnum and trunc(min_tag_date + 1); <=== and min_recnum + 50000 etc. dbms_output.put_line ('Loop: '||j||', Inserting records for day: ' ||to_char(min_tag_date,'dd-MON-yy hh24:mi:ss') ); commit; min_tag_date := min_tag_date + 1; <====== will be min_recnum := min_renum + 50000 end loop; end if; EXCEPTION when others then null;

   END;
END;
END;
/    

Dave <none_at_nowhere.nothere.oops> wrote in article <811m34$na5$1_at_lure.pipex.net>...

> I have a 43,000,000 row table with about 225,000
> duplicate records.
> What is the easiest way to remove the duplicates.
> I want to remove one of every duplicate record
> from the table.
> 
> Dave
> 
> 
> 
> 
Received on Thu Nov 18 1999 - 16:29:30 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US