Re: sql to delete rows from joining a table to itself

From: Shakespeare <whatsin_at_xs4all.nl>
Date: Sat, 04 Jul 2009 09:32:06 +0200
Message-ID: <4a4f0652$0$198$e4fe514c_at_news.xs4all.nl>



Mark D Powell schreef:
> On Jun 30, 11:00 am, Shakespeare <what..._at_xs4all.nl> wrote:

>> Mark D Powell schreef:
>>
>>
>>
>>
>>
>>> On Jun 29, 8:03 am, Ed Prochak <edproc..._at_gmail.com> wrote:
>>>> On Jun 28, 7:51 pm, Jeff Calico <jeffcal..._at_gmail.com> wrote:
>>>>> Hello everyone
>>>>> I am trying to delete rows from a massive data table that are "almost"
>>>>> duplicates. I can do a nice query
>>>>> to find them when I join the table with itself, but I can't figure out
>>>>> how to form a delete statement out
>>>>> of this.
>>>>> Query:
>>>>> SELECT a.*
>>>>> FROM Trafficdata a, Trafficdata b
>>>>> WHERE a.id = b.id
>>>>> AND a.date = b.date
>>>>> AND a.flag = 'Q'
>>>>> AND b.flag = 'Q'
>>>>> AND a.dstop = b.dstop
>>>>> AND (a.batch != b.batch OR (a.batch = b.batch AND a.pos_marker <
>>>>> b.pos_marker) );
>>>>> One idea I tried after searching the web was
>>>>> DELETE FROM ( <above query> ); but that gives
>>>>> ORA-01752 "Cannot delete from view without exactly one key preserved
>>>>> table"
>>>>> Any help would be greatly appreciated!
>>>>> Jeff C.
>>>> Do you have a primary key on this Trafficdata table?
>>>> Assuming your results are the rows your really want to delete then a
>>>> delete of the form:
>>>> DELETE FROM Trafficdata WHERE pk IN (SELECT a.pk from ...) ;
>>>> I suggest you do this AFTER doing a backup.
>>>> Ed- Hide quoted text -
>>>> - Show quoted text -
>>> Assuming you want to keep one copy of each row the following will work
>>> delete from table_name
>>> where (key_list, rowid) in
>>> ( select keys, rowid from table_name
>>> minus
>>> select keys, min(rowid) from table_name
>>> group by keys
>>> )
>>> /
>>> HTH -- Mark D Powell --
>> Looks like you are skipping the condition that the record with the
>> largest pos_marker should be preserved, not the lowest rowid.
>> Furthermore, if a.batch != b.batch, both records would be deleted in the
>> original query (if it worked).
>>
>> And is there no mutating table problem here? Won't this lead to
>> 'snapshot too old' problems with large tables? (I'm not sure about that)
>>
>> Shakespeare- Hide quoted text -
>>
>> - Show quoted text -
> 
> The delete is a generic template.  It would be very easy to modify it
> to keep the max whatever instead of the min rowid.
> 
> Mutating tables happen only when a trigger is involved and you attempt
> to modify the base table from the trigger.  

Right, missed that part.
But how about the snapshot too old issue? I think to remember that a 'delete from a where exists (select .. from a)' with large tables can cause that error.. or was that only with use of cursors?

>There is no mention of a
> trigger; however there is mention of a view error.  The delete should
> probably be aimed at one or more of the base tables in the view.

Yes. Because there is not exactly one key preserved table, it can not determined from which table the row should be deleted (eventhough they are actually the same one)

> Remove the extra rows from the base table then the view will display
> the right results since the data has been fixed.  Either that or the
> duplcate data should exist and it is the view definition which needs
> fixing to filter out what is being called duplicate.  I have seen both
> issues over my career.
> 
> HTH -- Mark D Powell --
> 
> 
> 

Shakespeare Received on Sat Jul 04 2009 - 02:32:06 CDT

Original text of this message