Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Removing duplicated rows revisited - I'm feeln' mighty lame

Re: Removing duplicated rows revisited - I'm feeln' mighty lame

From: <mark.powell_at_eds.com>
Date: Thu, 18 Mar 1999 21:30:41 GMT
Message-ID: <7crra1$aia$1@nnrp1.dejanews.com>


I was involved in Airlines travel late yesterday so forgive me if I miss the mark: 1) If I remember right an exists clause fires its subquery once for every row returned in the parent query so I suspect that your subquery is being fired once for every row in the call_table . The explain output would show if this is how Oracle is doing the delete. Since you have already extracted one copy of the rows you want to delete I think opening the temp table as a cursor in a pl/sql block would allow you to issue a delete once for each of your 34,000 captured rows which would save Oracle a ton of work. You could possibly get similiar results by turing the exists into a in clause on your temp_table if you do not want to use pl/sql, but I think that is the best bet. 2) You might consider building an index on every column in your temp table select. This should allow Oracle to use an index to perform the group by, but the time to build the index plus the actual build may not save you anything over the current actual build if this is a one-time job, and you do not want to keep a large multi-column index around just for once or twice a year clean up. But, you should still consider it, if there are a lot of other columns in your call_table.

In article <36ee69fc.4304658_at_usenet.acw.vcu.edu>,   bgwillia_at_vcu.edu (Boyce G. Williams, Jr.) wrote:
> >
> I found a table where there is a lot of duplicated telephone calls in
> it. To be more precise: 34,000 duplicate calls from a table containing
> 1.5 million rows. Some were posted the same day as its duplicate,
> others were posted on different days. I wrote a series of queries to
> extract the first posted call of a duplicated set, delete all the
> duplicated calls, then reinsert the first call back in. My concern is
> the actual delete step: it uses a correlated subquery. Previous
> experience with our machine (a RS/6000 running Oracle) showed me it
> has a hard time working with correlated subqueries of any kind. At
> one point I ran a SELECT correlated subquery for six hours without it
> ever completing. When I broke up the query into two, the inner query
> being a VIEW and the outer query being a SELECT based on the VIEW, it
> took two hours and completed. Go figure.
>
> I'm presenting what I'm planning to do in the hopes someone may give
> advice on improving it. I'm mainly concerned about the DELETE step
> since every source I've found on deleting rows from one table based on
> another uses the correlated subquery method. Maybe someone knows a
> better way...
>
> --
> -- create a table where the first call of a duplicated set is
> -- stored based on the earliest posted date
> --
> DROP TABLE TEMP_TABLE
> CREATE TABLE TEMP_TABLE AS
> SELECT DISTINCT
> person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> MIN(date_posted) AS date_posted
> FROM CALL_TABLE
> GROUP BY person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds
> HAVING Count(*)>1 ;
> --
> -- Delete all duplicated calls from a table
> --
> DELETE FROM CALL_TABLE
> WHERE EXISTS
> ( SELECT *
> FROM CALL_TABLE Tmp
> GROUP BY Tmp.person_id ,
> Tmp.date_called ,
> Tmp.destination_phone ,
> Tmp.origin_phone ,
> Tmp.duration_minutes ,
> Tmp.duration_seconds
> HAVING Count(*)>1
> And Tmp.person_id = person_id
> And Tmp.date_called = date_called
> And Tmp.destination_phone = destination_phone
> And Tmp.origin_phone = origin_phone
> And Tmp.duration_minutes = duration_minutes
> And Tmp.duration_seconds = duration_seconds ) ;
> --
> -- Reinsert the first posted record
> --
> INSERT INTO CALL_TABLE
> ( person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> date_posted )
> SELECT person_id ,
> date_called ,
> origin_phone ,
> destination_phone ,
> duration_minutes ,
> duration_seconds ,
> date_posted
> FROM TEMP_TABLE ;
> --
> -- end of procedure
> --
>
> Thanks in advance,
> Boyce G. Williams, Jr.
> Boyce G. Williams, Jr.
>
> .--------------------------------------------------------------------.
> | "People should have two virtues: purpose- the courage to envisage |
> | and pursue valued goals uninhibited by the defeat of infantile |
> | fantasies, by guilt and the failing fear punishment; and wisdom- a|
> | detached concern with life itself, in the face of death itself." |
> | Norman F. Dixon|
> '--------------------------------------------------------------------'
>

Mark D. Powell -- The only advice that counts is the advice that  you follow so follow your own advice --

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own Received on Thu Mar 18 1999 - 15:30:41 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US