Re: sql to delete rows from joining a table to itself

From: Jeff Calico <jeffcalico_at_gmail.com>
Date: Wed, 8 Jul 2009 11:28:17 -0700 (PDT)
Message-ID: <8f2c21ad-a42e-4784-92f4-d4cf8d3b8e49_at_m3g2000pri.googlegroups.com>



On Jun 29, 11:55 pm, Peter Nilsson <ai..._at_acay.com.au> wrote:
> On Jun 29, 9:51 am, Jeff Calico <jeffcal..._at_gmail.com> wrote:
>
> > Hello everyone
>
> > I am trying to delete rows from a massive data table that
> > are "almost" duplicates. I can do a nice query to find
> > them when I join the table with itself, but I can't
> > figure out how to form a delete statement out of this.
>
> > Query:
>
> > SELECT a.*
> > FROM Trafficdata a, Trafficdata b
> > WHERE a.id = b.id
> > AND a.date = b.date
>
> Sic?
>
> > AND a.flag = 'Q'
> > AND b.flag = 'Q'
> > AND a.dstop = b.dstop
> > AND (a.batch != b.batch OR (a.batch = b.batch AND
> > a.pos_marker < b.pos_marker) );
>
> Note that if there are rows with the same (id, date,
> flag = 'Q', dstop) tuple, but different batch numbers,
> then you'll get every row for that tuple.
>
> Is that really what you want? Or do you only want to
> remove all but the the top pos_markers for every (id,
> date, flag = 'Q', dstop, batch) tuple? Or something
> else...?
>
> Note also that this will potentially give you multiple
> copies of some rows from a.
>
> > One idea I tried after searching the web was
> > DELETE FROM ( <above query> ); but that gives
> > ORA-01752 "Cannot delete from view without exactly
> > one key preserved table"
>
> Hint: A better select query might be...
>
>   select a.*
>     from trafficdata a
>    where a.flag = 'Q'
>      and exists
>            (select 1
>               from trafficdata b
>              where b.id = a.id
>                and b.date = a.date
>                and b.flag = a.flag
>                and b.dstop = a.dstop
>                and (   b.batch != a.batch
>                     or b.batch  = a.batch
>                          and b.pos_marker > a.pos_marker));
>
> --
> Peter

Thanks for your reply, Peter (and thanks to everyone else who responded!). Sorry for the delay in getting back to this thread, but as you may have experienced yourselves, an issue that is really important to management one day can be replaced by another hot potato issue the next day.

You are correct that the original query was flawed, as I saw by creating a small table and using rowids to pinpoint exactly which row I was returning. I made the query work by getting all duplicate rows with rowid created than the min(rowid) for that set of data. It works great on a small table, but for 30+ million rows, it churns away for many hours and isn't practical to do in the production enviroment.

--Jeff Received on Wed Jul 08 2009 - 13:28:17 CDT

Original text of this message