Re: Question about deleting table duplicates

From: Russ Bagley <bigfatface_at_hotmail.com>
Date: 19 Sep 2003 06:17:40 -0700
Message-ID: <b863dacd.0309190517.500024df_at_posting.google.com>


"ScottH" <fakeaddress_at_newsgroupsonly.com> wrote in message news:<pOKdnbFXV7ane_SiU-KYvQ_at_giganews.com>...
> I was looking for thw SQL to delete dupes from a table, and came across
> this. All who saw it agreed in principle, but I can't quite figure out the
> logic. If we are deleting all rows whose rowid is greater than the least of
> the rowids returned from creating the subset of dupes, couldn't we
> inadvertently delete some non-dupes rows that were created after the last
> dupe ? I mean, any row created after the last dupe would have a greater
> rowid, wouldn't it ?
>
> Here's the SQL:
>
> delete from table_1 a
> where a.rowid >
> (select min(b.rowid)
> from table_1 b
> where b.col_dup_values = a.col_dup_values)

The condition in the subselect (b.col_dup_values = a.col_dup_values) links the two instances (a and b) of table_1 in this SQL. The delete statement, therefore, only affects table_1 with the condition: (b.col_dup_values = a.col_dup_values).

This won't delete all duplicated rows, as such, but any row that is a duplicate of a row that already exists - leaving one row where there were several duplicates. If you wanted to do that the SQL is much simpler (at least, simpler to follow).

DELETE FROM table_1
WHERE col_dup_values = (SELECT col_dup_values

                               ,COUNT(*) 
                         FROM   table_1 
                         WHERE  COUNT(*)>1 
                         GROUP BY col_dup_values)
Received on Fri Sep 19 2003 - 15:17:40 CEST

Original text of this message