Return-Path: <oracle-l-bounce@freelists.org>
Delivered-To: 2-oracle-l@orafaq.com
Received: (qmail 4930 invoked from network); 12 Dec 2006 19:44:08 -0600
Received: from freelists-180.iquest.net (HELO turing.freelists.org) (206.53.239.180)
  by 69.64.49.119 with SMTP; 12 Dec 2006 19:44:08 -0600
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id AA6E156842E;
 Tue, 12 Dec 2006 20:43:13 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 08472-05; Tue, 12 Dec 2006 20:43:13 -0500 (EST)
Received: from turing (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2091F56416A;
 Tue, 12 Dec 2006 20:43:13 -0500 (EST)
Received: with ECARTIS (v1.0.0; list oracle-l); Tue, 12 Dec 2006 20:41:54 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D914F567600
 for <oracle-l@freelists.org>; Tue, 12 Dec 2006 20:41:53 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 08238-02 for <oracle-l@freelists.org>;
 Tue, 12 Dec 2006 20:41:53 -0500 (EST)
Received: from piccollo.p6m7g8.net (c66-236-219-70.ip.panth.com [66.236.219.70])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id A7DEE561027
 for <oracle-l@freelists.org>; Tue, 12 Dec 2006 20:41:53 -0500 (EST)
Received: from piccollo.p6m7g8.net (localhost [127.0.0.1])
 by piccollo.p6m7g8.net (8.13.6/8.13.6) with ESMTP id kBD1ghvk016413
 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
 Tue, 12 Dec 2006 20:42:44 -0500 (EST)
 (envelope-from tboss@piccollo.p6m7g8.net)
Received: (from tboss@localhost)
 by piccollo.p6m7g8.net (8.13.6/8.13.6/Submit) id kBD1gcXB016411;
 Tue, 12 Dec 2006 20:42:38 -0500 (EST)
 (envelope-from tboss)
Message-Id: <200612130142.kBD1gcXB016411@piccollo.p6m7g8.net>
Subject: Re: de-dup process
To: ebadi01@yahoo.com
Date: Tue, 12 Dec 2006 20:42:38 -0500 (EST)
Cc: oracle-l@freelists.org
In-Reply-To: <841893.87492.qm@web31202.mail.mud.yahoo.com>
From: tboss@bossconsulting.com
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
 DK_POLICY_SIGNSOME,NO_REAL_NAME,SPF_HELO_PASS,SPF_PASS autolearn=ham 
 version=3.1.7
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on piccollo.p6m7g8.net
X-archive-position: 43101
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: tboss@bossconsulting.com
Precedence: normal
Reply-to: tboss@bossconsulting.com
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-subscribe: <oracle-l-request@freelists.org?Subject=subscribe>
List-owner: <mailto:steve.adams@ixora.com.au>
List-post: <mailto:oracle-l@freelists.org>
List-archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at avenirtech.net

From asktom, the best way I've found is to use Tom's little code snippet below:

delete from table your_huge_table
where rowid in
  (select rid
        from
        (select rowid rid,
                     row_number() over
                        (partition by varchar_that_defines_duplicates
                         order by rowid ) rn
   from your_huge_table
  )
where rn <> 1
)
/

It will get multiple duplicate rows, and works far faster than any not exists, minus,
or cursor-based solution.

A few other options exist for you if you can do them that may be faster
1. create table as select distinct; probably faster than doing any sort of deleting.

2. Alter table mytab enable constraint PK exceptions into exceptions;
Better way; much faster for large tables, lets you audit the 
duplicate rows by examining exceptions table.  (you must run 
$ORACLE_HOME/rdbms/admin/utlexcpt.sql before doing this).  
Con: the exceptions table will contain BOTH duplicate rows in
the source table ... you'll have to delete them manually.

3. Use unix.  Perhaps the purest fastest way is to use unix sort/unique commands:
a. sqlload data out or select out delimited
b. sort filename | uniq > new file
c. sqlload back in.

only a viable option if your table is "thin" and only has a few columns.

hope this helps, todd

> 
> We have a huge table (> 160 million rows) which has about 20 million duplicate rows that we need to delete.  What is the most efficient way to do this as we will need to do this daily?
>   A single varchar2(30) column is used to identified duplicates.  We could possibly have > 2 rows of duplicates.
>    
>   We are doing direct path load so no unique key indexes can be put on the table to take care of the duplicates.
>    
>   Platform: Oracle 10G RAC (2 node) on Solaris 10.
>    
--
http://www.freelists.org/webpage/oracle-l


