Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: "Seek and destroy" duplicate entries
Charles Hooper schrieb:
> Alex wrote:
> > Charles Hooper schrieb:
> >
> > > Alex wrote:
> > > > Hello NG,
> > > >
> > > > considering the following table-structure (import-table):
> > > >
> > > > Material_Nr || Alternative_Hierarchy
> > > > -------------||--------------------
> > > > A || 10xalpha
> > > > A || 10xalpha
> > > > A || 10xalpha
> > > > A || 10xalpha
> > > > B || 20xgamma
> > > > B || 20xgamma
> > > > B || 20xgamma
> > > > B || 20xgamma
> > > > ...
> > > >
> > > > The data I've to import isn't normalized. Therefore, the original
> > > > datasets are inflated to nearly 140.000.
> > > > While I'm able to delete about 130.000 datasets by killing those, who
> > > > have NULL as "Alternative_Hierarchy", I've still got >8.000 datasets
> > > > left.
> > > >
> > > > But a lot of these are still duplicates!
> > > > Having only the mentioned columns, I failed to apply the method of
> > > > S-A-D using the MIN()/MAX() functions.
> > > >
> > > > Can anyone provide me with a solution of killing the duplicates (except
> > > > one) ?
> > > > Any help will be appreciated!
> > > >
> > > >
> > > > Alex Sauer
> > >
> > > The following should retrieve one row per MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY combination:
> > > SELECT
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > MIN(ROWID) ROW_ID
> > > FROM
> > > MY_TABLE
> > > GROUP BY
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY;
> > >
> > > Using the above as a starting point:
> > > SELECT
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > ROWID ROW_ID
> > > FROM
> > > MY_TABLE
> > > MINUS
> > > SELECT
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > MIN(ROWID) ROW_ID
> > > FROM
> > > MY_TABLE
> > > GROUP BY
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY;
> > >
> > > The above SQL statement retrieves the MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY of all rows, and then eliminates all of those
> > > that were contained in the original query. Now to make use of the
> > > results:
> > > DELETE FROM
> > > MY_TABLE
> > > WHERE
> > > (MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > ROWID)
> > > IN
> > > (SELECT
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > ROWID ROW_ID
> > > FROM
> > > MY_TABLE
> > > MINUS
> > > SELECT
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY,
> > > MIN(ROWID) ROW_ID
> > > FROM
> > > MY_TABLE
> > > GROUP BY
> > > MATERIAL_NR,
> > > ALTERNATIVE_HIERARCHY);
> > >
> > > Obviously, test the results before executing the DELETE.
> > >
> > > Charles Hooper
> > > PC Support Specialist
> > > K&M Machine-Fabricating, Inc.
> >
> > Thank you! I'll try that solution.
> > Can you tell me which one is speedier? "SELECT... MINUS SELECT ..." or
> > "SELECT ... WHERE ... NOT IN (SELECT...)"?
> >
> >
> > THX
> > Alex Sauer
>
>
>
>
Thanks again!
Alex Sauer Received on Mon Oct 16 2006 - 08:20:12 CDT
![]() |
![]() |