Home » SQL & PL/SQL » SQL & PL/SQL » A question about eliminate duplicates rows from a table (Oracle 10g r2, Windows XP)
A question about eliminate duplicates rows from a table [message #414815] Thu, 23 July 2009 11:41 Go to next message
cnvegnix
Messages: 21
Registered: June 2008
Junior Member
I have post a revision about "How does one eliminate duplicates rows from a table?" in SQL FAQ. The revision was reverted by Michel Cadot.

Url:
www.orafaq.com/wiki/index.php?title=SQL_FAQ&direction=next&oldid=11698#How_does_one_eliminate_duplicates_rows_from_a_table.3F

Diffiences between revisions:
www.orafaq.com/wiki/index.php?title=SQL_FAQ&diff=11699&oldid=11698

My method is:

delete from my_table t1   
where  ROWID NOT in (select MIN(ROWID) from my_table t2   
GROUP BY key_value1, key_value2); 


IF key_value1 IS NULL OR key_value2 IS NULL, this method still works correctly.

What's wrong about my method?
Re: A question about eliminate duplicates rows from a table [message #414817 is a reply to message #414815] Thu, 23 July 2009 11:58 Go to previous messageGo to next message
BlackSwan
Messages: 25036
Registered: January 2009
Location: SoCal
Senior Member
>What's wrong about my method?
Please post test case that validates your proposed SQL statement

DELETE FROM my_table t1
WHERE       rowid NOT IN (SELECT   Min(rowid)
                          FROM     my_table t2
                          GROUP BY key_value1,
                                   key_value2); 
Re: A question about eliminate duplicates rows from a table [message #414818 is a reply to message #414815] Thu, 23 July 2009 12:06 Go to previous messageGo to next message
cnvegnix
Messages: 21
Registered: June 2008
Junior Member
Test data:

CREATE TABLE my_table(key_value1 number, key_value2 number);
INSERT INTO my_table VALUES(1,2);
INSERT INTO my_table VALUES(1,2);
INSERT INTO my_table VALUES(3,null);
INSERT INTO my_table VALUES(3,null);
INSERT INTO my_table VALUES(null,4);
INSERT INTO my_table VALUES(null,4);
INSERT INTO my_table VALUES(null,null);
INSERT INTO my_table VALUES(null,null);
commit;

select * from my_table;


Execute:
delete from my_table t1
where  ROWID NOT in (select MIN(ROWID) from my_table t2
GROUP BY key_value1, key_value2);

select * from my_table;


Re: A question about eliminate duplicates rows from a table [message #414819 is a reply to message #414815] Thu, 23 July 2009 12:07 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Reference: Wiki page SQL FAQ.
Nothing is wrong with your method but it is just a variation of "Method 1" and so is not worth to be added.

Regards
Michel

[Updated on: Thu, 23 July 2009 12:10]

Report message to a moderator

Re: A question about eliminate duplicates rows from a table [message #414828 is a reply to message #414819] Thu, 23 July 2009 13:12 Go to previous messageGo to next message
cnvegnix
Messages: 21
Registered: June 2008
Junior Member
I think that my method is diffierent from SQL FAQ method 1 and better.

Test case:
CREATE TABLE my_table(key_value1 number, key_value2 number);
BEGIN
	-- duplicate values
	INSERT INTO my_table VALUES(1,2);
	INSERT INTO my_table VALUES(1,2);
	INSERT INTO my_table VALUES(3,null);
	INSERT INTO my_table VALUES(3,null);
	INSERT INTO my_table VALUES(null,4);
	INSERT INTO my_table VALUES(null,4);
	INSERT INTO my_table VALUES(null,null);
	INSERT INTO my_table VALUES(null,null);
	-- unique values
	FOR i IN 1 .. 10000 loop
		INSERT INTO my_table VALUES(i,0);
	END loop;
	commit;
end;
/

-- gather table statistics
EXEC dbms_stats.gather_table_stats(user,'MY_TABLE');

-- my method
delete from my_table t1
where  ROWID NOT in (select MIN(ROWID) from my_table t2
GROUP BY key_value1, key_value2);

rollback;

-- SQL FAQ method 1
DELETE FROM my_table A WHERE ROWID > (
SELECT min(rowid) FROM my_table B
WHERE (A.key_value1 = B.key_value1 OR (a.key_value1 IS NULL AND b.key_value1 IS null))
AND (A.key_value2 = B.key_value2 OR (a.key_value2 IS NULL AND b.key_value2 IS null)));

rollback;


Execution result:
Quote:

SQL> CREATE TABLE my_table(key_value1 number, key_value2 number);

Table created.

Elapsed: 00:00:00.07
SQL> BEGIN
2 -- duplicate values
3 INSERT INTO my_table VALUES(1,2);
4 INSERT INTO my_table VALUES(1,2);
5 INSERT INTO my_table VALUES(3,null);
6 INSERT INTO my_table VALUES(3,null);
7 INSERT INTO my_table VALUES(null,4);
8 INSERT INTO my_table VALUES(null,4);
9 INSERT INTO my_table VALUES(null,null);
10 INSERT INTO my_table VALUES(null,null);
11 -- unique values
12 FOR i IN 1 .. 10000 loop
13 INSERT INTO my_table VALUES(i,0);
14 END loop;
15 commit;
16 end;
17 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.57
SQL>
SQL> -- gather table statistics
SQL> EXEC dbms_stats.gather_table_stats(user,'MY_TABLE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.37
SQL>
SQL> -- my method
SQL> delete from my_table t1
2 where ROWID NOT in (select MIN(ROWID) from my_table t2
3 GROUP BY key_value1, key_value2);

4 rows deleted.

Elapsed: 00:00:00.06
SQL>
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.04
SQL>
SQL> -- SQL FAQ method 1
SQL> DELETE FROM my_table A WHERE ROWID > (
2 SELECT min(rowid) FROM my_table B
3 WHERE (A.key_value1 = B.key_value1 OR (a.key_value1 IS NULL AND b.key_value1 IS null))
4 AND (A.key_value2 = B.key_value2 OR (a.key_value2 IS NULL AND b.key_value2 IS null)));

4 rows deleted.

Elapsed: 00:00:18.43
SQL>
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.04
SQL>


My method spent 00.06 second on executing the task, but SQL FAQ method 1 spent 18.43 seconds on the same the task.

So I think that my method is better.
Re: A question about eliminate duplicates rows from a table [message #414835 is a reply to message #414828] Thu, 23 July 2009 14:14 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
No one is better, it depends on the case and no test case can prove something other one can prove the opposite in this matter.

Nevertheless I will reinsert your method as it shows a way to use set.

Regards
Michel
Re: A question about eliminate duplicates rows from a table [message #414839 is a reply to message #414835] Thu, 23 July 2009 14:56 Go to previous messageGo to next message
ThomasG
Messages: 3189
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
Well, from an execution standpoint, his query only queries the table once to get all the ROWIDs that need to be kept, while the other method queries the ROWID that need to be kept row by row.

So his method uses less I/O.

It might use more memory, though.

My gut feeling is that the FAQ solution 1 might be better for large tables with few duplicates, but the more duplicates there are the better his approach would be in comparison.

Re: A question about eliminate duplicates rows from a table [message #414840 is a reply to message #414839] Thu, 23 July 2009 15:00 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
And it also depends on indexes.
And the fact columns are logical key values that is can't be null or not.
And so on.
So no definitive answer, this is why there are several methods. Wink

Regards
Michel

[Updated on: Thu, 23 July 2009 15:01]

Report message to a moderator

Re: A question about eliminate duplicates rows from a table [message #414885 is a reply to message #414840] Thu, 23 July 2009 23:28 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
But if there are cases where his solution is better, then there would be no reason to remove it.
Re: A question about eliminate duplicates rows from a table [message #414958 is a reply to message #414815] Fri, 24 July 2009 02:42 Go to previous messageGo to next message
cnvegnix
Messages: 21
Registered: June 2008
Junior Member
Another test case, SMALL DISTINCT VALUES:
CREATE TABLE my_table(key_value1 number, key_value2 number);
BEGIN
	-- duplicate values
	FOR i IN 1 .. 1251 loop
		INSERT INTO my_table VALUES(1,2);
		INSERT INTO my_table VALUES(1,2);
		INSERT INTO my_table VALUES(3,null);
		INSERT INTO my_table VALUES(3,null);
		INSERT INTO my_table VALUES(null,4);
		INSERT INTO my_table VALUES(null,4);
		INSERT INTO my_table VALUES(null,null);
		INSERT INTO my_table VALUES(null,null);
	END loop;
	commit;
end;
/

-- create index ix_my_table on my_table(key_value1,key_value2);
-- create index ix_my_table on my_table(key_value1,key_value2);

-- gather table statistics
EXEC dbms_stats.gather_table_stats(user,'MY_TABLE');

-- my method
delete from my_table t1
where  ROWID NOT in (select MIN(ROWID) from my_table t2
GROUP BY key_value1, key_value2);

rollback;

-- SQL FAQ method 1
DELETE FROM my_table A WHERE ROWID > (
SELECT min(rowid) FROM my_table B
WHERE (A.key_value1 = B.key_value1 OR (a.key_value1 IS NULL AND b.key_value1 IS null))
AND (A.key_value2 = B.key_value2 OR (a.key_value2 IS NULL AND b.key_value2 IS null)));

rollback;


Execution result:
Quote:
SQL> BEGIN
2 -- duplicate values
3 FOR i IN 1 .. 1251 loop
4 INSERT INTO my_table VALUES(1,2);
5 INSERT INTO my_table VALUES(1,2);
6 INSERT INTO my_table VALUES(3,null);
7 INSERT INTO my_table VALUES(3,null);
8 INSERT INTO my_table VALUES(null,4);
9 INSERT INTO my_table VALUES(null,4);
10 INSERT INTO my_table VALUES(null,null);
11 INSERT INTO my_table VALUES(null,null);
12 END loop;
13 commit;
14 end;
15 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.10
SQL>
SQL> -- create index ix_my_table on my_table(key_value1,key_value2);
SQL> -- create index ix_my_table on my_table(key_value1,key_value2);
SQL>
SQL> -- gather table statistics
SQL> EXEC dbms_stats.gather_table_stats(user,'MY_TABLE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.73
SQL>
SQL> -- my method
SQL> delete from my_table t1
2 where ROWID NOT in (select MIN(ROWID) from my_table t2
3 GROUP BY key_value1, key_value2);

10004 rows deleted.

Elapsed: 00:00:00.86
SQL>
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.84
SQL>
SQL> -- SQL FAQ method 1
SQL> DELETE FROM my_table A WHERE ROWID > (
2 SELECT min(rowid) FROM my_table B
3 WHERE (A.key_value1 = B.key_value1 OR (a.key_value1 IS NULL AND b.key_value1 IS null))
4 AND (A.key_value2 = B.key_value2 OR (a.key_value2 IS NULL AND b.key_value2 IS null)));

10004 rows deleted.

Elapsed: 00:00:00.87
SQL>
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.82
SQL>

Re: A question about eliminate duplicates rows from a table [message #415121 is a reply to message #414815] Sat, 25 July 2009 01:13 Go to previous message
cnvegnix
Messages: 21
Registered: June 2008
Junior Member
SQL uses "IN" not "NOT IN":

DELETE FROM my_table WHERE rowid in
(
	SELECT rid FROM 
	(
		SELECT ROWID rid, 
			row_number() over (PARTITION BY key_value1, key_value2 ORDER BY rowid) rnum
		FROM my_table
	)
	WHERE rnum>1
);
Previous Topic: problem in sending email from UTL_SMTP pkg
Next Topic: update tablea based on data from tableb
Goto Forum:
  


Current Time: Sun Dec 04 00:15:06 CST 2016

Total time taken to generate the page: 0.07803 seconds