Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Removing duplicated rows revisited - I'm feeln' mighty lame

Re: Removing duplicated rows revisited - I'm feeln' mighty lame

From: James <james_lorenzen_at_allianzlife.com>
Date: Thu, 18 Mar 1999 20:32:39 GMT
Message-ID: <7crnt7$7bs$1@nnrp1.dejanews.com>


In article <36ee69fc.4304658_at_usenet.acw.vcu.edu>,   bgwillia_at_vcu.edu (Boyce G. Williams, Jr.) wrote:
> Hi,
>
> I found a table where there is a lot of duplicated telephone calls in
> it. To be more precise: 34,000 duplicate calls from a table containing
> 1.5 million rows. Some were posted the same day as its duplicate,
>

 [snip]
> '--------------------------------------------------------------------'
>

Boyce, if you can use PL*SQL the duplicate deletion can be accomplished without creating a separate table.

The following is a piece of code that I've used to remove large duplicate sets. I use a commit every 10,000 rows deleted. If that is not a concern, you could pull the commit logic.

Please note that the last line of the where clause is a not equal, this will leave the desired row on the table. You also need to "set server output on size 1000000" to have the "DBMS_OUTPUT.PUT_LINE" functioning.

DECLARE CURSOR dup_csr IS SELECT person_id , date_called , origin_phone ,  destination_phone , duration_minutes , duration_seconds , MIN(date_posted) AS date_posted FROM call_table GROUP BY person_id,

date_called, origin_phone, destination_phone ,	duration_minutes,
duration_seconds  HAVING Count(*)>1  ;	delete_cnt  NUMBER := 0 ; 
delete_commit NUMBER := 0 ; BEGIN DBMS_OUTPUT.PUT_LINE('Start at ' || TO_CHAR(SYSDATE,'hh24:mi:ss')) ; FOR dup_rec IN dup_csr LOOP DELETE FROM call_table WHERE person_id = dup_rec.person_id AND date_called = dup_rec.date_called AND origin_phone = dup_rec.origin_phone AND destination_phone = dup_rec.destination_phone AND duration_minutes = dup_rec.duration_minutes AND duration_seconds = dup_rec.duration_seconds AND date_posted <> dup_rec.date_posted; delete_cnt := delete_cnt +
SQL%ROWCOUNT ;	-- this is the rows deleted  delete_commit := delete_commit +
SQL%ROWCOUNT ;	IF delete_commit >= 10000  THEN  COMMIT ; 
DBMS_OUTPUT.PUT_LINE('Commit taken at ' || TO_CHAR(SYSDATE,'hh24:mi:ss') || ' rows = ' || delete_commit || ' so far ' || delete_cnt || ' rows deleted') ; delete_commit := 0 ; END IF ; END LOOP ; DBMS_OUTPUT.PUT_LINE('End at' || TO_CHAR(SYSDATE,'hh24:mi:ss') || ' ' || delete_cnt || ' rows deleted') ; END ; -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own Received on Thu Mar 18 1999 - 14:32:39 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US