Re: Question about deleting table duplicates

From: Mike Nugent <news_at_remove-this.illuminatus.org>
Date: Sat, 20 Sep 2003 11:05:41 GMT
Message-ID: <9wWab.1001399$3C2.22359455_at_news3.calgary.shaw.ca>


Russ Bagley wrote:
> "ScottH" <fakeaddress_at_newsgroupsonly.com> wrote in message news:<pOKdnbFXV7ane_SiU-KYvQ@giganews.com>...
>

>>I was looking for thw SQL to delete dupes from a table, and came across
>>this.  All who saw it agreed in principle, but I can't quite figure out the
>>logic.  If we are deleting all rows whose rowid is greater than the least of
>>the rowids returned from creating the subset of dupes, couldn't we
>>inadvertently delete some non-dupes rows that were created after the last
>>dupe ?  I mean, any row created after the last dupe would have a greater
>>rowid, wouldn't it ?
>>
>>Here's the SQL:
>>
>>delete from table_1 a
>>where a.rowid >
>>   (select min(b.rowid)
>>    from table_1 b
>>    where b.col_dup_values = a.col_dup_values)

>
>
> The condition in the subselect (b.col_dup_values = a.col_dup_values)
> links the two instances (a and b) of table_1 in this SQL. The delete
> statement, therefore, only affects table_1 with the condition:
> (b.col_dup_values = a.col_dup_values).
>
> This won't delete all duplicated rows, as such, but any row that is a
> duplicate of a row that already exists - leaving one row where there
> were several duplicates. If you wanted to do that the SQL is much
> simpler (at least, simpler to follow).
>
> DELETE FROM table_1
> WHERE col_dup_values = (SELECT col_dup_values
> ,COUNT(*)
> FROM table_1
> WHERE COUNT(*)>1
> GROUP BY col_dup_values)

Hmm. This isn't how I saw it.

Pretend data set

rowid col

1     1
2     2
3     1
4     4
5     1
6     6

Ok so the sub query said:
select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values

That would return 1, right? The min row where a.col = b.col.

So plug that into the original query:

delete from table_1 a
where a.rowid >

    (select min(b.rowid)

     from table_1 b
     where b.col_dup_values = a.col_dup_values)

You get:

delete from table_1 a
where a.rowid > (1)

Or am I missing something?

-- 
Mike Nugent
Programmer/Author/DBA/Admin
In search of employment, email for credentials
news_at_remove-this.illuminatus.org
Received on Sat Sep 20 2003 - 13:05:41 CEST

Original text of this message