Re: Working with huge tables of chronological data

From: charely <nospam_at_skynet.be>
Date: Mon, 9 Apr 2007 13:34:36 +0200
Message-ID: <461a24bd$0$13860$ba620e4c@news.skynet.be>

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> schreef in bericht news:FN2dnelHbc44iYfbnZ2dnUVZ8tmhnZ2d_at_bt.com...
>
> "John" <acide.ascorbique_at_gmail.com> wrote in message
> news:1175871313.425881.201500_at_n76g2000hsh.googlegroups.com...
>>
>> Chronology
>> (03:50pm, A1)
>> (03:51pm, B1)
>> (03:55pm, A2)
>> (03:58pm, A3)
>> (03:59pm, A4)
>> (04:00pm, B2)
>>
>> I'm looking for all the sequences of events A in the chronology with
>> no B event in the middle. The results would be:
>> (A2, A3) ; (A3, A4) and also (A2, A4) even if A2 and A4 are not
>> consecutive.
>>
>
> I may have missed know if anyone has come
>
> From the above, it looks as if you need the
> following query (in principle). The hints assume
> an index on ta(timed) and tb(timed)
>
> select
> /*+ ordered use_nl(ta2) index(ta1) index(ta2) */
> ta1.timed, ta1.event,
> ta2.timed, ta2.event
> from
> ta ta1,
> ta ta2
> where
> ta2.timed > ta1.timed
> and ta2.timed < (
> select /*+ index(tb) no_unnest */
> min(tb.timed)
> from tb
> where tb.timed > ta1.timed
> )
> order by
> ta1.timed
> ;
>
> 1) You need to join TA to itself because a single row in
> TA can produce a result set of unknown size.
>
> 2) The condition for generating a row is that another row
> in TA should have a larger time stamp, which does not
> exceed the next occurrence from TB.
>
> 3) I haven't yet considered problems relating to non-unique
> values for timestamp, and timestamps in TA matching
> timestamps in TB
>
> If you can create indexes on the (timed) column, then
> the plan for the query above gets as close to the
> simple COBOL merge (described by other posters)
> in action and performance as a non-procedural statement
> could.
>
> If you can't create the necessary indexes, then the
> code probably has to sort TA twice, and TB once
> which means that for large data sets it probably
> won't be able to match the non-procedural option.
>
> Execution plan (9i and 10g)
> -------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes |
> Cost |
> -------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 10090 | 216K|
> 204K|
> | 1 | TABLE ACCESS BY INDEX ROWID | TA | 5 | 55 |
> 101 |
> | 2 | NESTED LOOPS | | 10090 | 216K|
> 204K|
> | 3 | TABLE ACCESS BY INDEX ROWID | TA | 2009 | 22099 |
> 2007 |
> | 4 | INDEX FULL SCAN | TA_U1 | 2009 | | 7
> |
> |* 5 | INDEX RANGE SCAN | TA_U1 | 100 | | 1
> |
> | 6 | SORT AGGREGATE | | 1 | 7 | |
> | 7 | FIRST ROW | | 1 | 7 | 1
> |
> |* 8 | INDEX RANGE SCAN (MIN/MAX)| TB_U1 | 3 | | 1
> |
> -------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 5 - access("TA2"."TIMED">"SYS_ALIAS_1"."TIMED" AND "TA2"."TIMED"<
> (SELECT
> /*+ NO_UNNEST INDEX("TB") */ MIN("TB"."TIMED") FROM "TB" "TB"
> WHERE
> "TB"."TIMED">:B1))
> 8 - access("TB"."TIMED">:B1)
>
> The plan walks the index on TA in order fetching rows.
>
> For each row it gets index entries from TA where the timestamp
> exceeds the current timestamp - but because the subquery to
> TB is used as part of the access predicate you know that the
> range scan on the index is the minimum range scan. Because the
> subquery on the index on TB uses range scan (min/max) this
> too will be minimal.
>
> The costs appear as they do simply because of the amount
> of data I happened to create in the table over and above the
> sample set shown.
>
> --
> Regards
>
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
>
> Author: Cost Based Oracle: Fundamentals
> http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
>

This is basically the same query and execution as I suggested in an other part of this thread.
One issue however , you will be missing all rows combinations in ta that are later than any
event in tb , that is where "select min(tb.timed) from tb where tb.timed > ta1.timed"
returns null. This can be remedied by using nvl and some arbitrary large date:
nvl((select min(tb.timed) from tb where tb.timed > ta1.timed") , sysdate + 1)
- assuming there is are no future events in ta. Received on Mon Apr 09 2007 - 06:34:36 CDT