Re: Question about deleting table duplicates
From: Mike Nugent <news_at_remove-this.illuminatus.org>
Date: Sat, 20 Sep 2003 11:05:41 GMT
Message-ID: <9wWab.1001399$3C2.22359455_at_news3.calgary.shaw.ca>
>
>
> The condition in the subselect (b.col_dup_values = a.col_dup_values)
> links the two instances (a and b) of table_1 in this SQL. The delete
> statement, therefore, only affects table_1 with the condition:
> (b.col_dup_values = a.col_dup_values).
>
> This won't delete all duplicated rows, as such, but any row that is a
> duplicate of a row that already exists - leaving one row where there
> were several duplicates. If you wanted to do that the SQL is much
> simpler (at least, simpler to follow).
>
> DELETE FROM table_1
> WHERE col_dup_values = (SELECT col_dup_values
> ,COUNT(*)
> FROM table_1
> WHERE COUNT(*)>1
> GROUP BY col_dup_values)
Date: Sat, 20 Sep 2003 11:05:41 GMT
Message-ID: <9wWab.1001399$3C2.22359455_at_news3.calgary.shaw.ca>
Russ Bagley wrote:
> "ScottH" <fakeaddress_at_newsgroupsonly.com> wrote in message news:<pOKdnbFXV7ane_SiU-KYvQ@giganews.com>...
>
>>I was looking for thw SQL to delete dupes from a table, and came across >>this. All who saw it agreed in principle, but I can't quite figure out the >>logic. If we are deleting all rows whose rowid is greater than the least of >>the rowids returned from creating the subset of dupes, couldn't we >>inadvertently delete some non-dupes rows that were created after the last >>dupe ? I mean, any row created after the last dupe would have a greater >>rowid, wouldn't it ? >> >>Here's the SQL: >> >>delete from table_1 a >>where a.rowid > >> (select min(b.rowid) >> from table_1 b >> where b.col_dup_values = a.col_dup_values)
>
>
> The condition in the subselect (b.col_dup_values = a.col_dup_values)
> links the two instances (a and b) of table_1 in this SQL. The delete
> statement, therefore, only affects table_1 with the condition:
> (b.col_dup_values = a.col_dup_values).
>
> This won't delete all duplicated rows, as such, but any row that is a
> duplicate of a row that already exists - leaving one row where there
> were several duplicates. If you wanted to do that the SQL is much
> simpler (at least, simpler to follow).
>
> DELETE FROM table_1
> WHERE col_dup_values = (SELECT col_dup_values
> ,COUNT(*)
> FROM table_1
> WHERE COUNT(*)>1
> GROUP BY col_dup_values)
Hmm. This isn't how I saw it.
Pretend data set
rowid col
1 1 2 2 3 1 4 4 5 1 6 6
Ok so the sub query said:
select min(b.rowid)
from table_1 b
where b.col_dup_values = a.col_dup_values
That would return 1, right? The min row where a.col = b.col.
So plug that into the original query:
delete from table_1 a
where a.rowid >
(select min(b.rowid)
from table_1 b where b.col_dup_values = a.col_dup_values)
You get:
delete from table_1 a
where a.rowid > (1)
Or am I missing something?
-- Mike Nugent Programmer/Author/DBA/Admin In search of employment, email for credentials news_at_remove-this.illuminatus.orgReceived on Sat Sep 20 2003 - 13:05:41 CEST