Re: De-duplication of large table

From: Dave <none_at_nowhere.nothere.oops>
Date: Thu, 18 Nov 1999 22:07:56 -0000
Message-ID: <811tbb$14f$1@lure.pipex.net>

You are correct in saying that there is no Primary key. The column I want to De-dupe Will be the primary key eventually. Don't be confused by recnum, it's just a number that was on the data before it was loaded into Oracle. The index is non-unique. I want to de-dupe the column and make it a unique index and Primary key. Due to circumstances it was loaded into oracle as a non-unique field and later discovered that there was duplicates.
For my problem just assume it contains data which needs to be unique, but at the moment it's not.

Dave
dave_at_scsdirect.co.uk

Alan Shein <alanshein_at_erols.com> wrote in message news:811pl3$7e8$1_at_autumn.news.rcn.net...
> It can also be done with a correlated subquery. Also, keep in mind that
> there is no "first" occurrence as (theoretically) the records are not
stored
> in any particular order. Anyway, you only need to delete duplicate
records,
> not the first, second, or any other sequenced duplicate.
>
> Are you sure there are duplicate recnums? Are the index coulmns unique?
> Basically, how are you defining duplicate records? It sounds like you do
not
> have a Primary Key.
>
>
> Dave <none_at_nowhere.nothere.oops> wrote in message
> news:811ojn$pnn$1_at_lure.pipex.net...
> > I am a novice to Oracle. I did think of that but
> > was unsure because of the size of the thing.
> > 43 million records, 407 char record length.
> > Takes up about 7 gb of space plus two index
> > columns taking up about 2gb.
> >
> > The Column I want to de-duplicate on contains
> > a recnum (Varchar2) which starts at 1 and goes to
> > about 45,000,000. There are some gaps and
> > some duplicates. I can probably generate
> > a list of the duplicate numbers but am unsure
> > how I would go about deleting the first occurrence
> > of a duplicate record in the table using SQL.
> >
> > Any suggestions would be much appreciated.
> > Dave
> >
> >
> >
> >
> > Alan Shein <alanshein_at_erols.com> wrote in message
> > news:811n6u$p4u$1_at_autumn.news.rcn.net...
> > > There are many ways, depending on other factors, but the easiest way
is
> to
> > >
> > > CREATE TABLE newtable AS SELECT DISTINCT * FROM oldtable;
> > >
> > > then drop the oldtable and rename the newtable to the oldtable's name.
> > >
> > > There are some disadvantages to doing it this way, but, as I said, it
> > > depends on what you ultimately need to do.
> > >
> > > Dave <none_at_nowhere.nothere.oops> wrote in message
> > > news:811m34$na5$1_at_lure.pipex.net...
> > > > I have a 43,000,000 row table with about 225,000
> > > > duplicate records.
> > > > What is the easiest way to remove the duplicates.
> > > > I want to remove one of every duplicate record
> > > > from the table.
> > > >
> > > > Dave
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
Received on Thu Nov 18 1999 - 16:07:56 CST