Re: De-duplication of large table

From: Alan Shein <alanshein_at_erols.com>
Date: Thu, 18 Nov 1999 16:05:07 -0500
Message-ID: <811pl3$7e8$1@autumn.news.rcn.net>

It can also be done with a correlated subquery. Also, keep in mind that there is no "first" occurrence as (theoretically) the records are not stored in any particular order. Anyway, you only need to delete duplicate records, not the first, second, or any other sequenced duplicate.

Are you sure there are duplicate recnums? Are the index coulmns unique? Basically, how are you defining duplicate records? It sounds like you do not have a Primary Key.

Dave <none_at_nowhere.nothere.oops> wrote in message news:811ojn$pnn$1_at_lure.pipex.net...
> I am a novice to Oracle. I did think of that but
> was unsure because of the size of the thing.
> 43 million records, 407 char record length.
> Takes up about 7 gb of space plus two index
> columns taking up about 2gb.
>
> The Column I want to de-duplicate on contains
> a recnum (Varchar2) which starts at 1 and goes to
> about 45,000,000. There are some gaps and
> some duplicates. I can probably generate
> a list of the duplicate numbers but am unsure
> how I would go about deleting the first occurrence
> of a duplicate record in the table using SQL.
>
> Any suggestions would be much appreciated.
> Dave
>
>
>
>
> Alan Shein <alanshein_at_erols.com> wrote in message
> news:811n6u$p4u$1_at_autumn.news.rcn.net...
> > There are many ways, depending on other factors, but the easiest way is
to
> >
> > CREATE TABLE newtable AS SELECT DISTINCT * FROM oldtable;
> >
> > then drop the oldtable and rename the newtable to the oldtable's name.
> >
> > There are some disadvantages to doing it this way, but, as I said, it
> > depends on what you ultimately need to do.
> >
> > Dave <none_at_nowhere.nothere.oops> wrote in message
> > news:811m34$na5$1_at_lure.pipex.net...
> > > I have a 43,000,000 row table with about 225,000
> > > duplicate records.
> > > What is the easiest way to remove the duplicates.
> > > I want to remove one of every duplicate record
> > > from the table.
> > >
> > > Dave
> > >
> > >
> > >
> >
> >
>
>
Received on Thu Nov 18 1999 - 15:05:07 CST