Re: removing duplicates rows

From: Roland Schaar <rschaar_at_gosch.com>
Date: Sat, 03 Jul 1999 20:46:51 +0200
Message-ID: <377E5A9B.58569607_at_gosch.com>


Have you tried this?

I assume that there is a primary key column named ID.

delete from xxxx
where ID in

    (select min(id)

     from xxxx
     group by col1, col2, col3, ....
     having count(*)=2);

If there is not primary key column try this: delete from xxxx
where rowid in

    (select min(rowid)

     from xxxx
     group by col1, col2, col3, ....
     having count(*)=2);

roland

CJJITP wrote:

> <<
> Piotrek wrote:
>
> > CJJITP wrote:
> >
> > > <<From: "al" <asalvado_at_ev1.net>
> > > Date: Thu, 24 June 1999 08:39 PM EDT
> > > Message-id: <gjAc3.5747$d5.725031_at_news21b.ispnews.com>
> > >
> > > If a tables rows are accidently duplicated, how do you remove
> duplicates?
> > >
> > > 1st you can count the rows, and then use this count in the
> following SQL:
> > > DELETE: FROM table
> > > WHERE rownum < ( (row_count/2) + 1);
> > >
> > > This works only if you duplicated the rows all at once. What's the
> proper
> > > way of doing it?
> > >
> > > Thanks.
> > > >>
> > >
> > > You have not stated the table size or the number of dups. The
> DELETE
> statement
> > > after a fashion would work, but may create some serious
> performance
> problems.
> > > The most direct way would be something like:
> > >
> > > create table new_tbl_nm
> > > .... storage parameters ....
> > > unrecoverable parallel(degree 3)
> > > as
> > > select /*+ parallel(a,3) */
> > > distinct
> > > *
> > > from old_tbl_nm a
> > > /
> > > rename old_tbl_nm to old_tbl_nm_backup
> > > /
> > > rename new_tbl_nm to old_tbl_nm
> > > /
> > >
> > > After you are sure that everything is OK, then and only then:
> > >
> > > DROP TABLE
> > >
> > > Cecil J. Jones
> > > Chief Technologist
> > > Information Technology Paradigms, Incorporated
> > > Data Warehouse, Data Marts and
> > > Decision Enabling Technology Consultants
> > > (415) 454-9424 -- Fax (415) 454-9489
> > > E-Mail: cjjitp_at_aol.com
> >
> > is it good for table with ~ 20 mln records ?
>
> or another question: how to remove records with duplicated some
> columns values
> (not
> all - * ) ?
>
> >>
>
> I need a specific example for the latter question.
>
> As to the earlier question about ~20 million row table. The solution
> will
> work just fine. If you have an environment that will let you work
> with table
> that size, your are OK. If the Oracle instance support Parallel Query
> (as
> shown in the sample code) you may be the performance will be even
> better.
>
> The solution does not create a redo log, the DELETE solution does
> create redo
> log entries, the more dup rows the bigger the log and greater the load
> on the
> whole system.
>
> As to triggers and other constrains that might be on the table -- they
> can be
> moved for the old table to the new one before the old table is
> dropped.
>
> Best of success!
>
> Cecil J. Jones
> Chief Technologist
> Information Technology Paradigms, Incorporated
> Data Warehouse, Data Marts and
> Decision Enabling Technology Consultants
> (415) 454-9424 -- Fax (415) 454-9489
> E-Mail: cjjitp_at_aol.com
Received on Sat Jul 03 1999 - 20:46:51 CEST

Original text of this message