Re: SQL: Working with huge tables of chronological data

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 5 Apr 2007 14:07:36 -0700
Message-ID: <1175807256.825415.286830@w1g2000hsg.googlegroups.com>

On Apr 5, 3:50 pm, "John" <acide.ascorbi..._at_gmail.com> wrote:
> > Is this the real problem or a simplification?
>
> It's a simplification but not that much. The real problem involves
> user_ids but this part can be skipped here.
>
> > Is this something that will be run once or repeatedly?
>
> Only once.
>
> > Is it possible for the same time to be in A and B?
>
> No, A and B are completely different data.
>
> > Is it possible to have a B before an A beginning the sequence?
> > Is it possible for there to be multiple Bs between As?
>
> Yes everything is possible, A events and B events happen
> independently.
>
> Thanks for being interested in my problem!
>
> John

I would be inclined to not handle this by not just using SQL. You could potentially have a nearly full Cartesian join on the first table to itself, for example:
SELECT
  A1.V1,
  A2.V1
FROM
  T1 A1,
  T1 A2
WHERE
  A1.V1<A2.V1;

A quick setup:
CREATE TABLE T1 (V1 DATE NOT NULL, V2 VARCHAR2(10)); CREATE TABLE T2 (V1 DATE NOT NULL, V2 VARCHAR2(10)); CREATE INDEX T1_IND1 ON T1(V1);
CREATE INDEX T2_IND1 ON T2(V1); INSERT INTO
  T1
SELECT
  TRUNC(TRUNC(SYSDATE) + (ROWNUM*2.5/24/60),'MI'),   TO_CHAR(ROWNUM)
FROM
  DUAL
CONNECT BY
  LEVEL<=1000;

COMMIT; INSERT INTO
  T2
SELECT
  TRUNC(TRUNC(SYSDATE) + (ROWNUM*9.415/24/60),'MI'),   TO_CHAR(ROWNUM)
FROM
  DUAL
CONNECT BY
  LEVEL<=300;

COMMIT; We now have two tables, T1 and T2, that correspond to your table A and B, respectively. If we perform a full outer join between these two table, we obtain all time values in the two tables with no duplicates (9i+ syntax) (note that TO_CHAR is used to limit the width of the columns for display purposes):
SELECT
  TO_CHAR(T1.V1,'HH24:MI') T1_V1,
  T1.V2 T1_V2,
  TO_CHAR(T2.V1,'HH24:MI') T2_V1,
  T2.V2 T2_V2
FROM
  T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY
  NVL(T1.V1,T2.V1); T1_V1 T1_V2 T2_V1 T2_V2
===== ========== ===== ==========

00:15 6
00:17 7

00:18 2

00:20 8
00:22 9
00:25 10
00:27 11
		 00:28 3
00:30 12
00:32 13
00:35 14
00:37 15	 00:37 4
00:40 16
00:42 17
00:45 18
00:47 19	 00:47 5
00:50 20

00:52 21
00:55 22

00:56 6

00:57 23
01:00 24
01:02 25
01:05 26	 01:05 7
01:07 27
01:10 28
01:12 29
01:15 30	 01:15 8
01:17 31
01:20 32
01:22 33
		 01:24 9
01:25 34
01:27 35

01:30 36
01:32 37

01:34 10

Extending the above to give more detail: SELECT
  TO_CHAR(NVL(T1.V1,T2.V1),'HH24:MI') TIME_DATE,   DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,   TO_CHAR(T1.V1,'HH24:MI') T1_V1,
  T1.V2 T1_V2,
  TO_CHAR(T2.V1,'HH24:MI') T2_V1,
  T2.V2 T2_V2
FROM
  T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY
  NVL(T1.V1,T2.V1);
TIME_DATE TIME_SLOT T1_V1 T1_V2 T2_V1 T2_V2 ========= ========= ===== ========== ===== ==========

00:02	  A	    00:02 1
00:05	  A	    00:05 2
00:07	  A	    00:07 3
00:09	  B			     00:09 1
00:10	  A	    00:10 4
00:12	  A	    00:12 5
00:15	  A	    00:15 6
00:17	  A	    00:17 7
00:18	  B			     00:18 2
00:20	  A	    00:20 8
00:22	  A	    00:22 9
00:25	  A	    00:25 10
00:27	  A	    00:27 11
00:28	  B			     00:28 3
00:30	  A	    00:30 12
00:32	  A	    00:32 13
00:35	  A	    00:35 14
00:37	  AB	    00:37 15	     00:37 4
00:40	  A	    00:40 16
00:42	  A	    00:42 17
00:45	  A	    00:45 18
00:47	  AB	    00:47 19	     00:47 5
00:50	  A	    00:50 20
00:52	  A	    00:52 21
00:55	  A	    00:55 22
00:56	  B			     00:56 6

What can we do with the above to avoid the Cartesian join as much as possible? We can use LEAD to peek at the next set of values: SELECT

  TO_CHAR(TIME_DATE,'HH24:MI') TIME_DATE,
  TO_CHAR(LEAD(TIME_DATE,1) OVER (ORDER BY TIME_DATE),'HH24:MI')
NEXT_TIME_DATE,

  TIME_SLOT,
  LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,   T1_V1,
  T1_V2
FROM
  (SELECT
    NVL(T1.V1,T2.V1) TIME_DATE,
    DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,     TO_CHAR(T1.V1,'HH24:MI') T1_V1,
    T1.V2 T1_V2,
    TO_CHAR(T2.V1,'HH24:MI') T2_V1,
    T2.V2 T2_V2
  FROM
    T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1   ORDER BY
    NVL(T1.V1,T2.V1)); TIME_DATE NEXT_TIME_DATE TIME_SLOT NEXT_TIME_SLOT T1_V1 T1_V2 ========= ============== ========= ============== ===== ==========

00:02	  00:05 	 A	   A		  00:02 1
00:05	  00:07 	 A	   A		  00:05 2
00:07	  00:09 	 A	   B		  00:07 3
00:09	  00:10 	 B	   A
00:10	  00:12 	 A	   A		  00:10 4
00:12	  00:15 	 A	   A		  00:12 5
00:15	  00:17 	 A	   A		  00:15 6
00:17	  00:18 	 A	   B		  00:17 7
00:18	  00:20 	 B	   A
00:20	  00:22 	 A	   A		  00:20 8
00:22	  00:25 	 A	   A		  00:22 9
00:25	  00:27 	 A	   A		  00:25 10
00:27	  00:28 	 A	   B		  00:27 11
00:28	  00:30 	 B	   A
00:30	  00:32 	 A	   A		  00:30 12
00:32	  00:35 	 A	   A		  00:32 13
00:35	  00:37 	 A	   AB		  00:35 14
00:37	  00:40 	 AB	   A		  00:37 15
00:40	  00:42 	 A	   A		  00:40 16
00:42	  00:45 	 A	   A		  00:42 17
00:45	  00:47 	 A	   AB		  00:45 18
00:47	  00:50 	 AB	   A		  00:47 19
00:50	  00:52 	 A	   A		  00:50 20
00:52	  00:55 	 A	   A		  00:52 21
00:55	  00:56 	 A	   B		  00:55 22
00:56	  00:57 	 B	   A

Now, if you can scan through the rows returned programmatically, creating a processing break when TIME_SLOT or NEXT_TIME_SLOT is not A, you should be able to handle the processing. In this case remember 00:02, since TIME_SLOT is A and NEXT_TIME_SLOT is A, and report 00:02 - 00:05. Process the next line, and remember 00:05 also, since TIME_SLOT is A and NEXT_TIME_SLOT is A, and output 00:02 - 00:07 and 00:05 - 00:07. Process the next line, either TIME_SLOT is not A or NEXT_TIME_SLOT is not A, so clear the remembered list and process the next line. It is quite simple to handle programmatically.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Thu Apr 05 2007 - 16:07:36 CDT