Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: SQL: Working with huge tables of chronological data

Re: SQL: Working with huge tables of chronological data

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 5 Apr 2007 19:51:28 -0700
Message-ID: <1175827888.072988.16340@l77g2000hsb.googlegroups.com>


On Apr 5, 7:44 pm, Maxim Demenko <mdeme..._at_gmail.com> wrote:
> Charles Hooper schrieb:
> > On Apr 5, 3:50 pm, "John" <acide.ascorbi..._at_gmail.com> wrote:
> >>> Is this the real problem or a simplification?
> >> It's a simplification but not that much. The real problem involves
> >> user_ids but this part can be skipped here.
>
> >>> Is this something that will be run once or repeatedly?
> >> Only once.
>
> >>> Is it possible for the same time to be in A and B?
> >> No, A and B are completely different data.
>
> >>> Is it possible to have a B before an A beginning the sequence?
> >>> Is it possible for there to be multiple Bs between As?
> >> Yes everything is possible, A events and B events happen
> >> independently.
>
> >> Thanks for being interested in my problem!
>
> >> John
>
> > I would be inclined to not handle this by not just using SQL. You
> > could potentially have a nearly full Cartesian join on the first table
> > to itself, for example:
>
> > Charles Hooper
> > PC Support Specialist
> > K&M Machine-Fabricating, Inc.
>
> I suggest, pure SQL solution could be like this:

(SNIP)
> WITH u as ( SELECT * FROM a UNION ALL SELECT * FROM b ORDER BY 2)
> ,
> t as
> (SELECT u.*,
> row_number() over(order by edate) id, -- to enumerate
> all events
>
> decode(lag(event) over (order by edate), event, 0, 1)
> start_of_group
> FROM u
> ORDER BY edate
> )
> ,
> t1 as ( SELECT id,event,edate,sum(start_of_group) over(order by
> edate) group_id FROM t)
> ,
> t2 as
> (SELECT id,
> event,
> edate,
> group_id,
> count(*) over(partition by group_id) cnt_in_group,
> row_number() over(partition by group_id order by edate)
> row_num
> FROM t1
> )
> ,
> t3 as
> (SELECT connect_by_root(id) first_event,
> connect_by_root(edate) first_date,
> id last_event,
> edate last_date
> FROM t2
> WHERE cnt_in_group >1
> AND event ='A' connect by prior row_num=row_num-1
> AND prior group_id=group_id
> )
> SELECT *
> FROM t3
> WHERE first_event != last_event
> ORDER BY first_event,
> last_event;
>
> However, considering the data volumes ( 1e8 for A Events and 3e7 for B
> Events), this could result in worst case ( all B events are
> chronologically after or before A events) in 1e8!/2!(1e8-2)!
> permutations, what nearly equals 1e16 rows, in best case, if all B
> events are evenly distributed among A events, this will result in
> approximately 1e7 rows, assuming the distribution is somewhere in the
> middle - still would expect a lot of data in result set - i would second
> the Charles suggestion to process programmatically, possibly dividing
> the source data in chunks.
>
> Best regards
>
> Maxim

Nice. I stopped short of providing a solution in my previous post. Here is an example of an all SQL solution, continuing from my previous post:

Let's try an experiment to see if we can generate a sequential counter that skips when either TIME_SLOT or NEXT_TIME_SLOT is not A: SELECT

  TIME_DATE,
  NEXT_TIME_DATE,
  (ROW_NUMBER() OVER (ORDER BY TIME_DATE))*DECODE(TIME_SLOT,'A',
1,NULL)*DECODE(NEXT_TIME_SLOT,'A',1,NULL) TT,   TIME_SLOT,
  NEXT_TIME_SLOT,
  T1_V1,
  T1_V2
FROM
(SELECT
  TIME_DATE TIME_DATE,
  LEAD(TIME_DATE) OVER (ORDER BY TIME_DATE) NEXT_TIME_DATE,   TIME_SLOT,
  LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,   T1_V1,
  T1_V2
FROM
  (SELECT
    NVL(T1.V1,T2.V1) TIME_DATE,
    DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,     TO_CHAR(T1.V1,'HH24:MI') T1_V1,
    T1.V2 T1_V2,
    TO_CHAR(T2.V1,'HH24:MI') T2_V1,
    T2.V2 T2_V2
  FROM
    T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1   ORDER BY
    NVL(T1.V1,T2.V1)));      TIME_DATE NEXT_TIME_DATE TT TIME_SLOT NEXT_TIME_SLOT T1_V1 T1_V2
==================== ==================== =========== =========
============== ===== ==========
05-APR-2007 00:02:00 05-APR-2007 00:05:00	    1 A 	A	       00:02 1
05-APR-2007 00:05:00 05-APR-2007 00:07:00	    2 A 	A	       00:05 2
05-APR-2007 00:07:00 05-APR-2007 00:09:00	      A 	B	       00:07 3
05-APR-2007 00:09:00 05-APR-2007 00:10:00	      B 	A
05-APR-2007 00:10:00 05-APR-2007 00:12:00	    5 A 	A	       00:10 4
05-APR-2007 00:12:00 05-APR-2007 00:15:00	    6 A 	A	       00:12 5
05-APR-2007 00:15:00 05-APR-2007 00:17:00	    7 A 	A	       00:15 6
05-APR-2007 00:17:00 05-APR-2007 00:18:00	      A 	B	       00:17 7
05-APR-2007 00:18:00 05-APR-2007 00:20:00	      B 	A
05-APR-2007 00:20:00 05-APR-2007 00:22:00	   10 A 	A	       00:20 8
05-APR-2007 00:22:00 05-APR-2007 00:25:00	   11 A 	A	       00:22 9
05-APR-2007 00:25:00 05-APR-2007 00:27:00	   12 A 	A	       00:25 10
05-APR-2007 00:27:00 05-APR-2007 00:28:00	      A 	B	       00:27 11
05-APR-2007 00:28:00 05-APR-2007 00:30:00	      B 	A
05-APR-2007 00:30:00 05-APR-2007 00:32:00	   15 A 	A	       00:30 12
05-APR-2007 00:32:00 05-APR-2007 00:35:00	   16 A 	A	       00:32 13
05-APR-2007 00:35:00 05-APR-2007 00:37:00	      A 	AB	       00:35 14
05-APR-2007 00:37:00 05-APR-2007 00:40:00	      AB	A	       00:37 15
05-APR-2007 00:40:00 05-APR-2007 00:42:00	   19 A 	A	       00:40 16
05-APR-2007 00:42:00 05-APR-2007 00:45:00	   20 A 	A	       00:42 17
05-APR-2007 00:45:00 05-APR-2007 00:47:00	      A 	AB	       00:45 18
05-APR-2007 00:47:00 05-APR-2007 00:50:00	      AB	A	       00:47 19
05-APR-2007 00:50:00 05-APR-2007 00:52:00	   23 A 	A	       00:50 20
05-APR-2007 00:52:00 05-APR-2007 00:55:00	   24 A 	A	       00:52 21
05-APR-2007 00:55:00 05-APR-2007 00:56:00	      A 	B	       00:55 22
05-APR-2007 00:56:00 05-APR-2007 00:57:00	      B 	A
05-APR-2007 00:57:00 05-APR-2007 01:00:00	   27 A 	A	       00:57 23
05-APR-2007 01:00:00 05-APR-2007 01:02:00	   28 A 	A	       01:00 24
05-APR-2007 01:02:00 05-APR-2007 01:05:00	      A 	AB	       01:02 25
05-APR-2007 01:05:00 05-APR-2007 01:07:00	      AB	A	       01:05 26
05-APR-2007 01:07:00 05-APR-2007 01:10:00	   31 A 	A	       01:07 27
...

Now, let's use SYS_CONNECT_BY_PATH to connect the start and end times together:
SELECT
  TIME_DATE,
  NEXT_TIME_DATE,
  TO_CHAR(TIME_DATE,'HH24:MI') ||'-' ||
SUBSTR(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(NEXT_TIME_DATE,'HH24:MI'),','),

2,50)||',',
1,INSTR(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(NEXT_TIME_DATE,'HH24:MI'),','),
2,50)||',',',')-1) TIME_RANGE

FROM
(SELECT
  TIME_DATE,
  NEXT_TIME_DATE,
  (ROW_NUMBER() OVER (ORDER BY TIME_DATE))*DECODE(TIME_SLOT,'A',
1,NULL)*DECODE(NEXT_TIME_SLOT,'A',1,NULL) TS,   TIME_SLOT,
  NEXT_TIME_SLOT
  T1_V1,
  T1_V2
FROM
(SELECT
  TIME_DATE TIME_DATE,
  LEAD(TIME_DATE,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_DATE,   TIME_SLOT,
  LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,   T1_V1,
  T1_V2
FROM
  (SELECT
    NVL(T1.V1,T2.V1) TIME_DATE,
    DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,     TO_CHAR(T1.V1,'HH24:MI') T1_V1,
    T1.V2 T1_V2,
    TO_CHAR(T2.V1,'HH24:MI') T2_V1,
    T2.V2 T2_V2
  FROM
    T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1   ORDER BY
    NVL(T1.V1,T2.V1))))
CONNECT BY PRIOR
  TS=TS+1;

     TIME_DATE NEXT_TIME_DATE TIME_RANGE

==================== ====================
=========================================================
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:05
05-APR-2007 00:05:00 05-APR-2007 00:07:00 00:05-00:07
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:07
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:12
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:15
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:15
05-APR-2007 00:15:00 05-APR-2007 00:17:00 00:15-00:17
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:17
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:17
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:22
05-APR-2007 00:22:00 05-APR-2007 00:25:00 00:22-00:25
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:25
05-APR-2007 00:25:00 05-APR-2007 00:27:00 00:25-00:27
05-APR-2007 00:22:00 05-APR-2007 00:25:00 00:22-00:27
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:27
05-APR-2007 00:30:00 05-APR-2007 00:32:00 00:30-00:32
05-APR-2007 00:32:00 05-APR-2007 00:35:00 00:32-00:35
05-APR-2007 00:30:00 05-APR-2007 00:32:00 00:30-00:35
05-APR-2007 00:40:00 05-APR-2007 00:42:00 00:40-00:42
05-APR-2007 00:42:00 05-APR-2007 00:45:00 00:42-00:45
05-APR-2007 00:40:00 05-APR-2007 00:42:00 00:40-00:45
05-APR-2007 00:50:00 05-APR-2007 00:52:00 00:50-00:52
05-APR-2007 00:52:00 05-APR-2007 00:55:00 00:52-00:55
05-APR-2007 00:50:00 05-APR-2007 00:52:00 00:50-00:55
05-APR-2007 00:57:00 05-APR-2007 01:00:00 00:57-01:00
05-APR-2007 01:00:00 05-APR-2007 01:02:00 01:00-01:02
05-APR-2007 00:57:00 05-APR-2007 01:00:00 00:57-01:02
05-APR-2007 01:07:00 05-APR-2007 01:10:00 01:07-01:10
05-APR-2007 01:10:00 05-APR-2007 01:12:00 01:10-01:12
05-APR-2007 01:07:00 05-APR-2007 01:10:00 01:07-01:12
05-APR-2007 01:17:00 05-APR-2007 01:20:00 01:17-01:20
05-APR-2007 01:20:00 05-APR-2007 01:22:00 01:20-01:22
05-APR-2007 01:17:00 05-APR-2007 01:20:00 01:17-01:22
05-APR-2007 01:25:00 05-APR-2007 01:27:00 01:25-01:27
05-APR-2007 01:27:00 05-APR-2007 01:30:00 01:27-01:30 05-APR-2007 01:25:00 05-APR-2007 01:27:00 01:25-01:30 ...

    (Rows 1652) CONNECT BY WITHOUT FILTERING (cr=23 pr=0 pw=0 time=31559 us)

    (Rows 1194)    VIEW  (cr=23 pr=0 pw=0 time=29665 us)
    (Rows 1194)     WINDOW NOSORT (cr=23 pr=0 pw=0 time=24882 us)
    (Rows 1194)      VIEW  (cr=23 pr=0 pw=0 time=20154 us)
    (Rows 1194)       WINDOW SORT (cr=23 pr=0 pw=0 time=14131 us)
    (Rows 1194)        VIEW  (cr=23 pr=0 pw=0 time=15595 us)
    (Rows 1194)         SORT ORDER BY (cr=23 pr=0 pw=0 time=12007 us)
    (Rows 1194)          VIEW  (cr=23 pr=0 pw=0 time=28726 us)
    (Rows 1194)           UNION-ALL  (cr=23 pr=0 pw=0 time=23946 us)
    (Rows 1000)            HASH JOIN RIGHT OUTER (cr=14 pr=0 pw=0
time=11251 us)
     (Rows 300)             TABLE ACCESS FULL T2 (cr=7 pr=0 pw=0
time=662 us)
    (Rows 1000)             TABLE ACCESS FULL T1 (cr=7 pr=0 pw=0
time=2026 us)
     (Rows 194)            MERGE JOIN ANTI (cr=9 pr=0 pw=0 time=4647
us)
     (Rows 300)             TABLE ACCESS BY INDEX ROWID T2 (cr=2 pr=0
pw=0 time=2128 us)
     (Rows 300)              INDEX FULL SCAN T2_IND1 (cr=1 pr=0 pw=0
time=918 us)
     (Rows 106)             SORT UNIQUE (cr=7 pr=0 pw=0 time=2528 us)
    (Rows 1000)              INDEX FAST FULL SCAN T1_IND1 (cr=7 pr=0
pw=0 time=2026 us)

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Thu Apr 05 2007 - 21:51:28 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US