Re: "Seek and destroy" duplicate entries

From: William Robertson <williamr2019_at_googlemail.com>
Date: 16 Oct 2006 08:41:25 -0700
Message-ID: <1161013285.883309.276240@m73g2000cwd.googlegroups.com>

On Oct 16, 12:36 pm, "Alex" <dead.man.walk..._at_gmx.de> wrote:
> Hello NG,
>
> considering the following table-structure (import-table):
>
> Material_Nr || Alternative_Hierarchy
> -------------||--------------------
> A || 10xalpha
> A || 10xalpha
> A || 10xalpha
> A || 10xalpha
> B || 20xgamma
> B || 20xgamma
> B || 20xgamma
> B || 20xgamma
> ...
>
> The data I've to import isn't normalized. Therefore, the original
> datasets are inflated to nearly 140.000.
> While I'm able to delete about 130.000 datasets by killing those, who
> have NULL as "Alternative_Hierarchy", I've still got >8.000 datasets
> left.
>
> But a lot of these are still duplicates!
> Having only the mentioned columns, I failed to apply the method of
> S-A-D using the MIN()/MAX() functions.
>
> Can anyone provide me with a solution of killing the duplicates (except
> one) ?
> Any help will be appreciated!
>
> Alex Sauer

One other standard dedupe would be:

DELETE import_table
WHERE ROWID IN
( SELECT LEAD(ROWID) OVER (PARTITION BY material_nr, alternative_hierarchy ORDER BY NULL)

FROM import_table );

Whether it would be more efficient to save the non-duplicates and truncate the table etc will depend on your data. Received on Mon Oct 16 2006 - 10:41:25 CDT