| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.server -> Re: SQL: Working with huge tables of chronological data
Charles Hooper schrieb:
> On Apr 5, 3:50 pm, "John" <acide.ascorbi..._at_gmail.com> wrote:
>>> Is this the real problem or a simplification? >> It's a simplification but not that much. The real problem involves >> user_ids but this part can be skipped here. >> >>> Is this something that will be run once or repeatedly? >> Only once. >> >>> Is it possible for the same time to be in A and B? >> No, A and B are completely different data. >> >>> Is it possible to have a B before an A beginning the sequence? >>> Is it possible for there to be multiple Bs between As? >> Yes everything is possible, A events and B events happen >> independently. >> >> Thanks for being interested in my problem! >> >> John
I suggest, pure SQL solution could be like this:
create table a as
with t as (
select 'A' event,dbms_random.value(1,31) + trunc(sysdate,'MM') edate
from dual
connect by level <= 60 )
select * from t;
create table b as
with t as (select 'B' event,dbms_random.value(1,31) +
trunc(sysdate,'MM') edate
from dual
connect by level <= 15)
select * from t;
WITH u as ( SELECT * FROM a UNION ALL SELECT * FROM b ORDER BY 2)
,
t as
(SELECT u.*,
row_number() over(order by edate) id, -- to enumerate
all events
decode(lag(event) over (order by edate), event, 0, 1)
start_of_group
FROM u
ORDER BY edate
)
,
t1 as ( SELECT id,event,edate,sum(start_of_group) over(order by
edate) group_id FROM t)
,
t2 as
(SELECT id,
event,
edate,
group_id,
count(*) over(partition by group_id) cnt_in_group,
row_number() over(partition by group_id order by edate)
row_num
FROM t1
)
,
t3 as
(SELECT connect_by_root(id) first_event,
connect_by_root(edate) first_date,
id last_event,
edate last_date
FROM t2
WHERE cnt_in_group >1
AND event ='A' connect by prior row_num=row_num-1
AND prior group_id=group_id
)
last_event;
However, considering the data volumes ( 1e8 for A Events and 3e7 for B Events), this could result in worst case ( all B events are chronologically after or before A events) in 1e8!/2!(1e8-2)! permutations, what nearly equals 1e16 rows, in best case, if all B events are evenly distributed among A events, this will result in approximately 1e7 rows, assuming the distribution is somewhere in the middle - still would expect a lot of data in result set - i would second the Charles suggestion to process programmatically, possibly dividing the source data in chunks.
Best regards
Maxim Received on Thu Apr 05 2007 - 18:44:50 CDT
![]() |
![]() |