Re: de-dup process

From: Mladen Gogala <mgogala_at_verizon.net>
Date: Wed, 13 Dec 2006 09:05:04 -0500
Message-id: <1166018704l.3920l.0l@medo.noip.com>

On 12/12/2006 08:42:38 PM, tboss_at_bossconsulting.com wrote:
> >From asktom, the best way I've found is to use Tom's little code snippet below:
>
> delete from table your_huge_table
> where rowid in
> (select rid
> from
> (select rowid rid,
> row_number() over
> (partition by varchar_that_defines_duplicates
> order by rowid ) rn
> from your_huge_table
> )
> where rn <> 1
> )
> /

Good luck with that if cardinality is +60M rows. I would also add a condition like WHERE ROWID in (SELECT ROW_ID from EXCEPTIONS) to your query. You only need to populate exceptions table with duplicates and if those duplicates are a small percentage of the total number of records, your task will be done two order of magnitude faster then without the exceptions table.

-- 
Mladen Gogala
http://www.mladen-gogala.com

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Dec 13 2006 - 08:05:04 CST