Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: De-duplication of large table
Hi,
I had a similar problem with 11 million records with about 4 million duplicates/ triplicates and so on.
I wrote this PL/SQL to take care of it month by month, you will have to do
it by range of recnum like
lets say increment by 50000 and you will have to use the loop 1000 times
rather than 80.
Later !!!
Suresh Bhat
Oracleguru
www.oracleguru.net
create or replace procedure delete_duplicates
is
BEGIN
/*
STEPS:
-- 1. Insert DISTINCT rows into TABLE_NAME_TEMP from TABLE_NAME, which this procedure accomplishes. -- 2. Then in SQL*Plus either RENAME TABLE_NAME to TABLE_NAME_TEMP or DROP it. -- Rename TABLE_NAME_TEMP to TABLE_NAME and create a unique index on TAG_DATE, TAG_NAME to avoid the duplicate insertion in future.*/
cursor curs_min_tag_date
is
select min(tag_date) <===== this will be min(recno)
from table_name;
--
BEGIN
/*
Insert DISTINCT records from table TABLE_NAME into
TABLE_NAME_TEMP.
*/
BEGIN
open curs_min_tag_date; fetch curs_min_tag_date into min_tag_date; <======min_tag_date will be your min_recnum close curs_min_tag_date;
--
dbms_output.put_line ('Min TAG_DATE from table TABLE_NAME: ' ||to_char(min_tag_date,'dd-MON-yy hh24:mi:ss') );
--
if min_tag_date is not null then /* This will take care of duplicates for up to 80 days. */ for j in 1..80 loop <========= Change this to 1000 but test it with say 10 or 100 insert into table_name_temp select distinct * from table_name where tag_date between trunc(min_tag_date ) <===This will become between min_recnum and trunc(min_tag_date + 1); <=== and min_recnum + 50000 etc. dbms_output.put_line ('Loop: '||j||', Inserting records for day: ' ||to_char(min_tag_date,'dd-MON-yy hh24:mi:ss') ); commit; min_tag_date := min_tag_date + 1; <====== will be min_recnum := min_renum + 50000 end loop; end if; EXCEPTION when others then null;
Dave <none_at_nowhere.nothere.oops> wrote in article <811m34$na5$1_at_lure.pipex.net>...
> I have a 43,000,000 row table with about 225,000 > duplicate records. > What is the easiest way to remove the duplicates. > I want to remove one of every duplicate record > from the table. > > Dave > > > >Received on Thu Nov 18 1999 - 16:29:30 CST