Home » SQL & PL/SQL » SQL & PL/SQL » SQL Query Help - Analytic Query
SQL Query Help - Analytic Query [message #219740] Thu, 15 February 2007 15:52 Go to next message
rasa
Messages: 45
Registered: February 2006
Member
Consider 3 rows of data as follows:
BEGIN_DATE	END_DATE	NUM_VAL
1/2/2006	1/31/2006	1.6081
1/2/2006	1/31/2006	0.2047
1/10/2006	1/31/2006	0.1321


I need to find the average of NUM_VAL for "each date" within begin and end dates. So, for 1/2/2006 the average will be (1.6081 + 0.2047)/2 = 0.9064. It will be so until 1/9/2006. But on 1/10/2006, the average will change (1.6081 + 0.2047 + 0.1321)/3 = 0.6483.

Now, I was able to resolve this through PL/SQL, wherein I generated the NUM_VAL for all dates within the BEGIN/END date ranges and then summed them up to average them through collections. Then finally stashed those rows into a global temporary table through BULK INSERT and then out of that global table, I did a GROUP BY to get the average calculated correctly as needed above.

However, I am unsatisfied because I think it can be solved through SQL through analytical queries, provided I can generate each date within the date-range. Now, I did get the cue from the date generator techniques, but am failing to achieve the results.

Any ideas will be appreciated. Here is the test table.

CREATE TABLE X_TEST(
                     BEGIN_DATE           DATE,
                     END_DATE             DATE,
                     NUM_VAL NUMBER
                   );
INSERT INTO X_TEST(BEGIN_DATE, END_DATE, NUM_VAL) VALUES (DATE '2006-01-02', DATE '2006-01-31', 1.6081);
INSERT INTO X_TEST(BEGIN_DATE, END_DATE, NUM_VAL) VALUES (DATE '2006-01-02', DATE '2006-01-31', 0.2047);
INSERT INTO X_TEST(BEGIN_DATE, END_DATE, NUM_VAL) VALUES (DATE '2006-01-10', DATE '2006-01-31', 0.1321);
COMMIT;



Re: SQL Query Help - Analytic Query [message #219753 is a reply to message #219740] Thu, 15 February 2007 17:36 Go to previous messageGo to next message
Art Metzer
Messages: 2478
Registered: December 2002
Senior Member
As you've identified, it seems like the biggest hurdle here may be generating rows where you don't have days. Pipelined functions are good for that.

Does the following code work for you?
SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY'
  2  /
 
Session altered.
 
SQL> CREATE TABLE x_test (
  2      begin_date           DATE
  3  ,   end_date             DATE
  4  ,   num_val              NUMBER
  5  );
 
Table created.
 
SQL> INSERT INTO x_test(begin_date, end_date, num_val) VALUES (DATE '2006-01-02', DATE '2006-01-31', 1.6081);
SQL> INSERT INTO x_test(begin_date, end_date, num_val) VALUES (DATE '2006-01-02', DATE '2006-01-31', 0.2047);
SQL> INSERT INTO x_test(begin_date, end_date, num_val) VALUES (DATE '2006-01-10', DATE '2006-01-31', 0.1321);
SQL> CREATE OR REPLACE TYPE tt_date AS TABLE OF DATE
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE FUNCTION gen_dates (
  2      p_start_date IN  DATE
  3  ,   p_end_date   IN  DATE
  4  )
  5  RETURN tt_date
  6  PIPELINED
  7  IS
  8  BEGIN
  9      FOR i IN 0 .. (p_end_date - p_start_date) LOOP
 10          pipe row(TRUNC(p_start_date) + i);
 11      END LOOP;
 12
 13      RETURN;
 14  END gen_dates;
 15  /
 
Function created.
 
SQL> SELECT   b.dt
  2  ,        AVG(xt.num_val) running_avg
  3  FROM     x_test   xt
  4  ,       (SELECT  column_value    dt
  5           FROM   (SELECT MIN(begin_date) d1
  6                   ,      MAX(end_date)   d2
  7                   FROM   x_test) a
  8           ,       TABLE(gen_dates(a.d1,a.d2))) b
  9  WHERE    b.dt BETWEEN xt.begin_date (+)
 10                    AND xt.end_date (+)
 11  GROUP BY b.dt
 12  ORDER BY b.dt
 13  /
 
DT          RUNNING_AVG
----------- -----------
02-JAN-2006       .9064
03-JAN-2006       .9064
04-JAN-2006       .9064
05-JAN-2006       .9064
06-JAN-2006       .9064
07-JAN-2006       .9064
08-JAN-2006       .9064
09-JAN-2006       .9064
10-JAN-2006       .6483
11-JAN-2006       .6483
12-JAN-2006       .6483
13-JAN-2006       .6483
14-JAN-2006       .6483
15-JAN-2006       .6483
16-JAN-2006       .6483
17-JAN-2006       .6483
18-JAN-2006       .6483
19-JAN-2006       .6483
20-JAN-2006       .6483
21-JAN-2006       .6483
22-JAN-2006       .6483
23-JAN-2006       .6483
24-JAN-2006       .6483
25-JAN-2006       .6483
26-JAN-2006       .6483
27-JAN-2006       .6483
28-JAN-2006       .6483
29-JAN-2006       .6483
30-JAN-2006       .6483
31-JAN-2006       .6483
 
30 rows selected.
 
SQL>
You could create a view on top of this SELECT statement, and get your point-in-time average that way.
Re: SQL Query Help - Analytic Query [message #219925 is a reply to message #219740] Fri, 16 February 2007 14:08 Go to previous messageGo to next message
gintsp
Messages: 118
Registered: February 2007
Senior Member
Actually you don't need analytics, it can be done with simple group by Smile

connect by stuff is used to generate necessary rows in your date range
subquery factoring clause is used to cache results of max(end_date) and min(start_date), as I'm reffering twice to min(min_date), but this clause isn't obligatory.

Query plan looks bad with two fullscans on table x_test, but I'm quite sure you can eliminate at least one of them with indexes on start_date and end_date to get the min and max values.

Of course the join date_range.dt between begin_date and end_date will become worse and worse as your rows in x_test increases especially when there are many rows fo particular date.

So here is the query you can try it out Smile

SQL> ed
Wrote file afiedt.buf

  1  select dt, avg(num_val) from (
  2    select dt, num_val
  3    from (
  4      with a as (
  5        select max(end_date) mx, min(begin_date) mn
  6        from x_test
  7      )
  8      select mn + rownum - 1 dt
  9      from a
 10      where rownum <= a.mx - a.mn + 1
 11      connect by level <= rownum
 12    ) date_range,
 13    x_test
 14    where date_range.dt between begin_date and end_date
 15  )
 16  group by dt
 17* order by dt
SQL> /

DT        AVG(NUM_VAL)
--------- ------------
02-JAN-06        .9064
03-JAN-06        .9064
04-JAN-06        .9064
05-JAN-06        .9064
06-JAN-06        .9064
07-JAN-06        .9064
08-JAN-06        .9064
09-JAN-06        .9064
10-JAN-06        .6483
11-JAN-06        .6483
12-JAN-06        .6483
13-JAN-06        .6483
14-JAN-06        .6483
15-JAN-06        .6483
16-JAN-06        .6483
17-JAN-06        .6483
18-JAN-06        .6483
19-JAN-06        .6483
20-JAN-06        .6483
21-JAN-06        .6483
22-JAN-06        .6483
23-JAN-06        .6483
24-JAN-06        .6483
25-JAN-06        .6483
26-JAN-06        .6483
27-JAN-06        .6483
28-JAN-06        .6483
29-JAN-06        .6483
30-JAN-06        .6483
31-JAN-06        .6483

30 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 527382811

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                           |     1 |    26 |    10  (20)| 00:00:01 |
|   1 |  SORT ORDER BY                      |                           |     1 |    26 |    10  (20)| 00:00:01 |
|   2 |   HASH GROUP BY                     |                           |     1 |    26 |    10  (20)| 00:00:01 |
|   3 |    NESTED LOOPS                     |                           |     1 |    26 |     8   (0)| 00:00:01 |
|   4 |     VIEW                            |                           |     1 |     6 |     5   (0)| 00:00:01 |
|   5 |      TEMP TABLE TRANSFORMATION      |                           |       |       |            |          |
|   6 |       LOAD AS SELECT                | X_TEST                    |       |       |            |          |
|   7 |        SORT AGGREGATE               |                           |     1 |    16 |            |          |
|   8 |         TABLE ACCESS FULL           | X_TEST                    |     3 |    48 |     3   (0)| 00:00:01 |
|   9 |       COUNT                         |                           |       |       |            |          |
|* 10 |        FILTER                       |                           |       |       |            |          |
|  11 |         CONNECT BY WITHOUT FILTERING|                           |       |       |            |          |
|  12 |          COUNT                      |                           |       |       |            |          |
|  13 |           VIEW                      |                           |     1 |    18 |     2   (0)| 00:00:01 |
|  14 |            TABLE ACCESS FULL        | SYS_TEMP_0FD9D661A_4DFADF |     1 |    16 |     2   (0)| 00:00:0
|* 15 |     TABLE ACCESS FULL               | X_TEST                    |     1 |    20 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - filter(ROWNUM<="A"."MX"-"A"."MN"+1)
  15 - filter("BEGIN_DATE"<=INTERNAL_FUNCTION("DATE_RANGE"."DT") AND
              "END_DATE">=INTERNAL_FUNCTION("DATE_RANGE"."DT"))


Statistics
----------------------------------------------------------
        100  recursive calls
         10  db block gets
        233  consistent gets
          1  physical reads
       1420  redo size
       1015  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         30  rows processed


Gints Plivna
http://www.gplivna.eu
Re: SQL Query Help - Analytic Query [message #219946 is a reply to message #219925] Fri, 16 February 2007 18:21 Go to previous messageGo to next message
rasa
Messages: 45
Registered: February 2006
Member
Thanks a lot to both of you. I need to lucubrate on the two solutions as both of them are fascinating, especially the simple SQL Group by solution. Once again, a big thanks to both of you.
Re: SQL Query Help - Analytic Query [message #220094 is a reply to message #219925] Sun, 18 February 2007 19:49 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
In this case, analytics should perform slightly better than the non-analytic approach.

SQL> with t as (
  2     select (mn + rownum - 1) dt from (
  3             select min(begin_date) mn, max(end_date) mx
  4             from x_test)
  5     connect by rownum <= mx - mn + 1)
  6  select dt "Date", avgnv "Average"
  7     from t,
  8     (select begin_date, lead(begin_date, 1, end_date + 1) over (order by begin_date) - 1 lbd
  9             , avg(num_val) over (order by begin_date range unbounded preceding) avgnv
 10             from x_test)
 11  where dt between begin_date and lbd
 12  order by 1;

Date         Average
--------- ----------
02-JAN-06      .9064
03-JAN-06      .9064
04-JAN-06      .9064
05-JAN-06      .9064
06-JAN-06      .9064
07-JAN-06      .9064
08-JAN-06      .9064
09-JAN-06      .9064
10-JAN-06      .6483
11-JAN-06      .6483
12-JAN-06      .6483
13-JAN-06      .6483
14-JAN-06      .6483
15-JAN-06      .6483
16-JAN-06      .6483
17-JAN-06      .6483
18-JAN-06      .6483
19-JAN-06      .6483
20-JAN-06      .6483
21-JAN-06      .6483
22-JAN-06      .6483
23-JAN-06      .6483
24-JAN-06      .6483
25-JAN-06      .6483
26-JAN-06      .6483
27-JAN-06      .6483
28-JAN-06      .6483
29-JAN-06      .6483
30-JAN-06      .6483
31-JAN-06      .6483

30 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=34)
   1    0   SORT (ORDER BY) (Cost=7 Card=1 Bytes=34)
   2    1     NESTED LOOPS (Cost=6 Card=1 Bytes=34)
   3    2       VIEW (Cost=2 Card=1 Bytes=6)
   4    3         COUNT
   5    4           CONNECT BY (WITHOUT FILTERING)
   6    5             COUNT
   7    6               VIEW (Cost=2 Card=1 Bytes=18)
   8    7                 SORT (AGGREGATE)
   9    8                   TABLE ACCESS (FULL) OF 'X_TEST' (Cost=2 Card=20 Bytes=320)

  10    2       VIEW
  11   10         WINDOW (SORT) (Cost=4 Card=20 Bytes=400)
  12   11           TABLE ACCESS (FULL) OF 'X_TEST' (Cost=2 Card=20 Bytes=400)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        995  bytes sent via SQL*Net to client
        510  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         30  rows processed
Re: SQL Query Help - Analytic Query [message #221061 is a reply to message #219925] Fri, 23 February 2007 09:48 Go to previous messageGo to next message
rasa
Messages: 45
Registered: February 2006
Member
gintsp:

Whenever I am executing your query against my 9i database, I get frequent ORA-3113 end-of-communication channel error. It varies. At times the connection gets sundered, at times it is not. But this much I can tell you that the connection gets sundered only when I run the query you had posited. Weird, but true. I am not sure if this is one of those quirks with the CONNECT BY. Just thought I would share.

Thanks.
Re: SQL Query Help - Analytic Query [message #221065 is a reply to message #221061] Fri, 23 February 2007 10:14 Go to previous messageGo to next message
ebrian
Messages: 2794
Registered: April 2006
Senior Member
Did you see any errors in your alert log when you received the ORA-3113 error?

Out of curiosity, did you get the same error if you tried to run the query I provided? I ran mine against 9i and didn't have any issues.

Re: SQL Query Help - Analytic Query [message #221067 is a reply to message #221065] Fri, 23 February 2007 10:27 Go to previous message
rasa
Messages: 45
Registered: February 2006
Member
ebrian:

I confirm that your query never created that problem. I will try to ask my DBA to provide me the alert log details, when that happens again when I execute "gintsp"'s query.
Previous Topic: Comparing two tables with no matching keys in PL/SQL
Next Topic: which oracle product to choose for sql
Goto Forum:
  


Current Time: Sun Dec 11 06:07:22 CST 2016

Total time taken to generate the page: 0.09808 seconds