Re: removing duplicates rows

From: CJJITP <cjjitp_at_aol.com>
Date: 26 Jun 1999 03:12:36 GMT
Message-ID: <19990625231236.07912.00000471_at_ng-cg1.aol.com>


<<
Piotrek wrote:

> CJJITP wrote:
>
> > <<From: "al" <asalvado_at_ev1.net>
> > Date: Thu, 24 June 1999 08:39 PM EDT
> > Message-id: <gjAc3.5747$d5.725031_at_news21b.ispnews.com>
> >
> > If a tables rows are accidently duplicated, how do you remove duplicates?
> >
> > 1st you can count the rows, and then use this count in the following SQL:
> > DELETE: FROM table
> > WHERE rownum < ( (row_count/2) + 1);
> >
> > This works only if you duplicated the rows all at once. What's the proper
> > way of doing it?
> >
> > Thanks.
> > >>
> >
> > You have not stated the table size or the number of dups. The DELETE
statement
> > after a fashion would work, but may create some serious performance
problems.
> > The most direct way would be something like:
> >
> > create table new_tbl_nm
> > .... storage parameters ....
> > unrecoverable parallel(degree 3)
> > as
> > select /*+ parallel(a,3) */
> > distinct
> > *
> > from old_tbl_nm a
> > /
> > rename old_tbl_nm to old_tbl_nm_backup
> > /
> > rename new_tbl_nm to old_tbl_nm
> > /
> >
> > After you are sure that everything is OK, then and only then:
> >
> > DROP TABLE
> >
> > Cecil J. Jones
> > Chief Technologist
> > Information Technology Paradigms, Incorporated
> > Data Warehouse, Data Marts and
> > Decision Enabling Technology Consultants
> > (415) 454-9424 -- Fax (415) 454-9489
> > E-Mail: cjjitp_at_aol.com
>
> is it good for table with ~ 20 mln records ?

or another question: how to remove records with duplicated some columns values (not
all - * ) ?

>>

I need a specific example for the latter question.

As to the earlier question about ~20 million row table. The solution will work just fine. If you have an environment that will let you work with table that size, your are OK. If the Oracle instance support Parallel Query (as shown in the sample code) you may be the performance will be even better.

The solution does not create a redo log, the DELETE solution does create redo log entries, the more dup rows the bigger the log and greater the load on the whole system.

As to triggers and other constrains that might be on the table -- they can be moved for the old table to the new one before the old table is dropped.

Best of success!

Cecil J. Jones
Chief Technologist
Information Technology Paradigms, Incorporated Data Warehouse, Data Marts and
Decision Enabling Technology Consultants (415) 454-9424 -- Fax (415) 454-9489
E-Mail: cjjitp_at_aol.com Received on Sat Jun 26 1999 - 05:12:36 CEST

Original text of this message