Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: SQL: Working with huge tables of chronological data
On Apr 5, 9:02 am, "John" <acide.ascorbi..._at_gmail.com> wrote:
> Thanks for your answer,
>
> Here are the technical details and the query I've been using so far.
>
> TableA is ~100 millions row and contains (timestamp, evtA)
> TableB is ~30 millions row and contains (timestamp, evtB)
>
> The following query took ~60h (on a private but quite slow server) to
> compute. ~1h is what I'm aiming to.
>
> select TA1_B.evtA, TA2.evtA
> from
> (
> select TA1.evtA, TA1.timestamp timeA1, TB.evtB, min(TB.timestamp)
> min_timeB
> from tableA TA1 left outer join tableB TB on (TA1.timestamp <
> TB.timestamp)
> group by TA1.evtA, TA1.timestamp, TB.evtB
> ) TA1_B,
> tableA TA2
> where
> TA1_B.timeA1 < TA2.timestamp
> and (TA2.timestamp < TA1_B.min_timeB or TA1_B.min_timeB is null)
> and TA1_B.evtA <> TA2.evtA;
>
> Thanks!
>
> John
>
> On Apr 5, 10:51 am, "Ana C. Dent" <anaced..._at_hotmail.com> wrote:
>
>
>
> > "John" <acide.ascorbi..._at_gmail.com> wrote in news:1175783243.167337.195580
> > @n59g2000hsh.googlegroups.com:
>
> > > Hi,
>
> > > I'm trying to figure out an efficient way to search for the non
> > > existence of events in chronological data with SQL. The goal (detailed
> > > below) seems pretty simple but so far it looks like it's tricky to do
> > > with Oracle. Here's my problem:
>
> > > I'm working with 2 simple but huge tables each recording a different
> > > kind of event associated with a timestamp. For instance:
>
> > > Table A
> > > (03:50pm, A1)
> > > (03:55pm, A2)
> > > (03:58pm, A3)
>
> > > Table B
> > > (03:51pm, B1)
> > > (04:00pm, B2)
>
> > > I'm looking for all the chronological sequences (Ax, Ay) where no B
> > > event is present between Ax and Ay. In this example, the result would
>
> > I do not understand the logic that would lead the answer above;
> > since none of the timestamps in Table A match any timestamp in Table B.
>
> > > be (A2, A3).
>
> > > I've been searching actively for an efficient solution for this
> > > problem and I couldn't find any fast enough. Do you have any idea?
>
> > How fast is fast enough?
> > How do we know what you tried & deemed unacceptable?
>
>
I'm too stupid to do this in SQL. I'd select out the timestamp, data and a tag for which table, from the two tables into two files, send both files through unix sort, awk for the pattern, and then back into the db with sqlloader or maybe as an external table. This often is faster because you don't upset Oracle with too-large sorting within the db, and you just blast sequentially through the data rather than trying to lag values.
Please don't top-post.
jg
-- @home.com is bogus. "Oh boy, we need to work on the I/O on this test system, I swear there are gerbils in that server running back and forth with some floppies in their mouths transferring the data between disks. " - Herod TReceived on Thu Apr 05 2007 - 16:39:08 CDT