Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Analytic function insight wanted

Re: Analytic function insight wanted

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 3 Mar 2007 08:04:02 -0800
Message-ID: <1172937842.763161.45390@30g2000cwc.googlegroups.com>


On Mar 2, 10:07 pm, y..._at_vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote:
> Hello
>
> I am looking for some insight on solving this problem using analytic
> functions.
>
> I already have another, non-analytic, solution, but I am not happy with it
> because it seems overly complicated. As a procedure I am sure I could
> just scan once thru the sorted data, so it seems "obvious" that I should
> be able to find an easy elegant solution to this in SQL, and I think that
> analytic functions would do that, but I just can't think how to use them
> here.
>
> I have a table like this (simplified version shown here).
>
> year category value1
> ---- -------- ------
>
> 2000 A1 1
> 2000 A2 10
> 2000 B1 2
> 2000 B2 20
> 2000 C 50
>
> 2001 A1 3
> 2001 B1 4
>
> 2002 A2 60
> 2002 B1 8
> 2002 C 7
>
> I show a kind of rollup with three categories, each being the sum of the
> two most recently available values for each year. A key issue being that
> the value for a year may be missing and so you need to use the most recent
> value from the past. A complication is that the final value for a year is
> actually the sum of two values, each of which may be from different
> previous years. (One column is not a sum so I show that just in case it
> makes a difference.)
>
> (The "+" is just to illustrate the numbers involved, the actual value is
> the mathematic sum of the numbers.)
>
> year sum(a1+a2) sum(b1+b2) C
> ---- ---------- ---------- -
> 2000 1+10 2+20 50
> 2001 3+10 4+20 50
> 2002 3+60 8+20 7
>
> Right now I use a partial cartesian join and a not-exists to provide an
> intermediate inline view that "fills in" the missing values. The final
> result is then just a simple group-by+decode of that intermediate view.
>
> This just a simplified example of the real thing.
>
> select year
> , sum( CASE WHEN category IN ('A1','A2')
> THEN value1 ELSE NULL END ) AS sum_A
> , sum( CASE WHEN category IN ('B1','B2')
> THEN value1 ELSE NULL END ) AS sum_B
> , sum( CASE WHEN category IN ('C' )
> THEN value1 ELSE NULL END ) AS the_C
> from (select year,category,value1 from big ugly join) my_intermediate_view
> group by year
>
> I think I should be able to do the whole thing in the one select
> statement - if I figure out how to correctly use the correct set of
> analytic functions to provide the most recent values for the pairs of
> values inside each sum.
>
> Can anyone provide insight on how to do this with analytic functions?
>
> Thanks for any feedback.

Let me start by saying that I don't like this problem. The solution below will require multiple Cartesian joins, and still is not a complete solution.

The setup:
CREATE TABLE T1 (
  YEAR NUMBER(4),
  CATEGORY VARCHAR2(2),
  VALUE1 NUMBER(12));

INSERT INTO T1 VALUES (2000,'A1',1);
INSERT INTO T1 VALUES (2000,'A2',10);
INSERT INTO T1 VALUES (2000,'B1',2);
INSERT INTO T1 VALUES (2000,'B2',20);
INSERT INTO T1 VALUES (2000,'C',50);

INSERT INTO T1 VALUES (2001,'A1',3);
INSERT INTO T1 VALUES (2001,'B1',4);

INSERT INTO T1 VALUES (2002,'A2',60);
INSERT INTO T1 VALUES (2002,'B1',8);
INSERT INTO T1 VALUES (2002,'C',7);

COMMIT; First, let's generate a list of all years between the first year and the last year in the data set:
  SELECT
    FIRST_YEAR+(ROWNUM-1) MY_YEAR
  FROM
    (SELECT

      MAX(YEAR)-MIN(YEAR)+1 YEARS,
      MIN(YEAR) FIRST_YEAR
    FROM
      T1),

    DUAL
  CONNECT BY ROWNUM<=YEARS;

   MY_YEAR


      2000
      2001
      2002

We cannot use DISTINCT to list the years, because there could very well be years that are entirely missing from the data set.

Now, we need a Cartesian join between all of the possible years, and all of the distinct values of CATEGORY:
SELECT
  YRS.MY_YEAR,
  CAT.CATEGORY
FROM
  (SELECT
    FIRST_YEAR+(ROWNUM-1) MY_YEAR
  FROM
    (SELECT

      MAX(YEAR)-MIN(YEAR)+1 YEARS,
      MIN(YEAR) FIRST_YEAR
    FROM
      T1),

    DUAL
  CONNECT BY ROWNUM<=YEARS) YRS,
  (SELECT DISTINCT
    CATEGORY
  FROM
    T1) CAT
ORDER BY
  CAT.CATEGORY,
  YRS.MY_YEAR;    MY_YEAR CA
---------- --
      2000 A1
      2001 A1
      2002 A1
      2000 A2
      2001 A2
      2002 A2
      2000 B1
      2001 B1
      2002 B1
      2000 B2
      2001 B2
      2002 B2
      2000 C
      2001 C
      2002 C

Now that we have a list of all possible combinations, we can start joining in the original data set.
SELECT
  C.MY_YEAR,
  C.CATEGORY,
  T1.VALUE1
FROM
  (SELECT
    YRS.MY_YEAR,
    CAT.CATEGORY
  FROM
    (SELECT
      FIRST_YEAR+(ROWNUM-1) MY_YEAR
    FROM

      (SELECT
        MAX(YEAR)-MIN(YEAR)+1 YEARS,
        MIN(YEAR) FIRST_YEAR
      FROM
        T1),
      DUAL

    CONNECT BY ROWNUM<=YEARS) YRS,
    (SELECT DISTINCT
      CATEGORY
    FROM
      T1) CAT) C,
  T1
WHERE
  C.MY_YEAR=T1.YEAR(+)
  AND C.CATEGORY=T1.CATEGORY(+)
ORDER BY
  C.CATEGORY,
  C.MY_YEAR;    MY_YEAR CA VALUE1
---------- -- ----------
      2000 A1          1
      2001 A1          3
      2002 A1
      2000 A2         10
      2001 A2
      2002 A2         60
      2000 B1          2
      2001 B1          4
      2002 B1          8
      2000 B2         20
      2001 B2
      2002 B2
      2000 C          50
      2001 C
      2002 C           7

The above looks ugly, and we are not done yet: SELECT
  MY_YEAR,
  CATEGORY,
  NVL(VALUE1,
    NVL(LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR DESC),
      LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR DESC)
      )
    ) L_VALUE1,
  VALUE1,

  LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
DESC) P1_VALUE1,
  LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
DESC) P2_VALUE1
FROM
(SELECT
  C.MY_YEAR,
  C.CATEGORY,
  T1.VALUE1
FROM
  (SELECT
    YRS.MY_YEAR,
    CAT.CATEGORY
  FROM
    (SELECT
      FIRST_YEAR+(ROWNUM-1) MY_YEAR
    FROM
      (SELECT
        MAX(YEAR)-MIN(YEAR)+1 YEARS,
        MIN(YEAR) FIRST_YEAR
      FROM
        T1),
      DUAL

    CONNECT BY ROWNUM<=YEARS) YRS,
    (SELECT DISTINCT
      CATEGORY
    FROM
      T1) CAT) C,
  T1
WHERE
  C.MY_YEAR=T1.YEAR(+)
  AND C.CATEGORY=T1.CATEGORY(+));    MY_YEAR CA L_VALUE1 VALUE1 P1_VALUE1 P2_VALUE1 ---------- -- ---------- ---------- ---------- ----------
      2002 A1          3                     3          1
      2001 A1          3          3          1
      2000 A1          1          1
      2002 A2         60         60                    10
      2001 A2         10                    10
      2000 A2         10         10
      2002 B1          8          8          4          2
      2001 B1          4          4          2
      2000 B1          2          2
      2002 B2         20                               20
      2001 B2         20                    20
      2000 B2         20         20
      2002 C           7          7                    50
      2001 C          50                    50
      2000 C          50         50

In the above, VALUE1 shows the value at the current year and category. P1_VALUE1 shows the value for the category in the previous year. P2_VALUE1 shows the value for the category two years previous to the row on the current row. L_VALUE1 is the result of searching through the VALUE1, and P1_VALUE1 columns looking for the first nonnull  value - if both contain nulls, P2_VALUE1 is returned.

Now, let's return the answer:
SELECT
  MY_YEAR,

  SUM(DECODE(CATEGORY,'A1',L_VALUE1,'A2',L_VALUE1,0)) A_SUM,
  SUM(DECODE(CATEGORY,'B1',L_VALUE1,'B2',L_VALUE1,0)) B_SUM,
  SUM(DECODE(CATEGORY,'C',L_VALUE1,'C2',L_VALUE1,0)) C_SUM
FROM
(SELECT
  MY_YEAR,
  CATEGORY,
  NVL(VALUE1,
    NVL(LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR DESC),
      NVL(LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR DESC),0)
      )
    ) L_VALUE1
FROM
(SELECT
  C.MY_YEAR,
  C.CATEGORY,
  T1.VALUE1
FROM
  (SELECT
    YRS.MY_YEAR,
    CAT.CATEGORY
  FROM
    (SELECT
      FIRST_YEAR+(ROWNUM-1) MY_YEAR
    FROM
      (SELECT
        MAX(YEAR)-MIN(YEAR)+1 YEARS,
        MIN(YEAR) FIRST_YEAR
      FROM
        T1),
      DUAL

    CONNECT BY ROWNUM<=YEARS) YRS,
    (SELECT DISTINCT
      CATEGORY
    FROM
      T1) CAT) C,
  T1
WHERE
  C.MY_YEAR=T1.YEAR(+)
  AND C.CATEGORY=T1.CATEGORY(+)))
GROUP BY
  MY_YEAR
ORDER BY
  MY_YEAR;    MY_YEAR A_SUM B_SUM C_SUM ---------- ---------- ---------- ----------
      2000         11         22         50
      2001         13         24         50
      2002         63         28          7

Did I mention that I don't like this problem? There might be other ways to solve the problem.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Sat Mar 03 2007 - 10:04:02 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US