Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: SQL: Working with huge tables of chronological data
Charles Hooper schrieb:
> On Apr 5, 3:50 pm, "John" <acide.ascorbi..._at_gmail.com> wrote:
>>> Is this the real problem or a simplification? >> It's a simplification but not that much. The real problem involves >> user_ids but this part can be skipped here. >> >>> Is this something that will be run once or repeatedly? >> Only once. >> >>> Is it possible for the same time to be in A and B? >> No, A and B are completely different data. >> >>> Is it possible to have a B before an A beginning the sequence? >>> Is it possible for there to be multiple Bs between As? >> Yes everything is possible, A events and B events happen >> independently. >> >> Thanks for being interested in my problem! >> >> John
I suggest, pure SQL solution could be like this:
create table a as
with t as (
select 'A' event,dbms_random.value(1,31) + trunc(sysdate,'MM') edate
from dual
connect by level <= 60 )
select * from t;
create table b as
with t as (select 'B' event,dbms_random.value(1,31) +
trunc(sysdate,'MM') edate
from dual
connect by level <= 15)
select * from t;
WITH u as ( SELECT * FROM a UNION ALL SELECT * FROM b ORDER BY 2)
, t as (SELECT u.*, row_number() over(order by edate) id, -- to enumerateall events
decode(lag(event) over (order by edate), event, 0, 1) start_of_group FROM u ORDER BY edate ) , t1 as ( SELECT id,event,edate,sum(start_of_group) over(order by edate) group_id FROM t) , t2 as (SELECT id, event, edate, group_id, count(*) over(partition by group_id) cnt_in_group, row_number() over(partition by group_id order by edate) row_num FROM t1 ) , t3 as (SELECT connect_by_root(id) first_event, connect_by_root(edate) first_date, id last_event, edate last_date FROM t2 WHERE cnt_in_group >1 AND event ='A' connect by prior row_num=row_num-1 AND prior group_id=group_id )
last_event;
However, considering the data volumes ( 1e8 for A Events and 3e7 for B Events), this could result in worst case ( all B events are chronologically after or before A events) in 1e8!/2!(1e8-2)! permutations, what nearly equals 1e16 rows, in best case, if all B events are evenly distributed among A events, this will result in approximately 1e7 rows, assuming the distribution is somewhere in the middle - still would expect a lot of data in result set - i would second the Charles suggestion to process programmatically, possibly dividing the source data in chunks.
Best regards
Maxim Received on Thu Apr 05 2007 - 18:44:50 CDT