| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Analytic function insight wanted
On Mar 2, 10:07 pm, y..._at_vtn1.victoria.tc.ca (Malcolm Dew-Jones)
wrote:
> Hello
>
> I am looking for some insight on solving this problem using analytic
> functions.
>
> I already have another, non-analytic, solution, but I am not happy with it
> because it seems overly complicated. As a procedure I am sure I could
> just scan once thru the sorted data, so it seems "obvious" that I should
> be able to find an easy elegant solution to this in SQL, and I think that
> analytic functions would do that, but I just can't think how to use them
> here.
>
> I have a table like this (simplified version shown here).
>
> year category value1
> ---- -------- ------
>
> 2000 A1 1
> 2000 A2 10
> 2000 B1 2
> 2000 B2 20
> 2000 C 50
>
> 2001 A1 3
> 2001 B1 4
>
> 2002 A2 60
> 2002 B1 8
> 2002 C 7
>
> I show a kind of rollup with three categories, each being the sum of the
> two most recently available values for each year. A key issue being that
> the value for a year may be missing and so you need to use the most recent
> value from the past. A complication is that the final value for a year is
> actually the sum of two values, each of which may be from different
> previous years. (One column is not a sum so I show that just in case it
> makes a difference.)
>
> (The "+" is just to illustrate the numbers involved, the actual value is
> the mathematic sum of the numbers.)
>
> year sum(a1+a2) sum(b1+b2) C
> ---- ---------- ---------- -
> 2000 1+10 2+20 50
> 2001 3+10 4+20 50
> 2002 3+60 8+20 7
>
> Right now I use a partial cartesian join and a not-exists to provide an
> intermediate inline view that "fills in" the missing values. The final
> result is then just a simple group-by+decode of that intermediate view.
>
> This just a simplified example of the real thing.
>
> select year
> , sum( CASE WHEN category IN ('A1','A2')
> THEN value1 ELSE NULL END ) AS sum_A
> , sum( CASE WHEN category IN ('B1','B2')
> THEN value1 ELSE NULL END ) AS sum_B
> , sum( CASE WHEN category IN ('C' )
> THEN value1 ELSE NULL END ) AS the_C
> from (select year,category,value1 from big ugly join) my_intermediate_view
> group by year
>
> I think I should be able to do the whole thing in the one select
> statement - if I figure out how to correctly use the correct set of
> analytic functions to provide the most recent values for the pairs of
> values inside each sum.
>
> Can anyone provide insight on how to do this with analytic functions?
>
> Thanks for any feedback.
Let me start by saying that I don't like this problem. The solution below will require multiple Cartesian joins, and still is not a complete solution.
The setup:
CREATE TABLE T1 (
YEAR NUMBER(4),
CATEGORY VARCHAR2(2),
VALUE1 NUMBER(12));
INSERT INTO T1 VALUES (2000,'A1',1); INSERT INTO T1 VALUES (2000,'A2',10); INSERT INTO T1 VALUES (2000,'B1',2); INSERT INTO T1 VALUES (2000,'B2',20); INSERT INTO T1 VALUES (2000,'C',50);
INSERT INTO T1 VALUES (2001,'A1',3);
INSERT INTO T1 VALUES (2001,'B1',4);
INSERT INTO T1 VALUES (2002,'A2',60); INSERT INTO T1 VALUES (2002,'B1',8); INSERT INTO T1 VALUES (2002,'C',7);
COMMIT;
First, let's generate a list of all years between the first year and
the last year in the data set:
SELECT
FIRST_YEAR+(ROWNUM-1) MY_YEAR
FROM
(SELECT
MAX(YEAR)-MIN(YEAR)+1 YEARS,
MIN(YEAR) FIRST_YEAR
FROM
T1),
MY_YEAR
2000
2001
2002
We cannot use DISTINCT to list the years, because there could very well be years that are entirely missing from the data set.
Now, we need a Cartesian join between all of the possible years, and
all of the distinct values of CATEGORY:
SELECT
YRS.MY_YEAR,
CAT.CATEGORY
FROM
(SELECT
FIRST_YEAR+(ROWNUM-1) MY_YEAR
FROM
(SELECT
MAX(YEAR)-MIN(YEAR)+1 YEARS,
MIN(YEAR) FIRST_YEAR
FROM
T1),
2000 A1
2001 A1
2002 A1
2000 A2
2001 A2
2002 A2
2000 B1
2001 B1
2002 B1
2000 B2
2001 B2
2002 B2
2000 C
2001 C
2002 C
Now that we have a list of all possible combinations, we can start
joining in the original data set.
SELECT
C.MY_YEAR,
C.CATEGORY,
T1.VALUE1
FROM
(SELECT
YRS.MY_YEAR,
CAT.CATEGORY
FROM
(SELECT
FIRST_YEAR+(ROWNUM-1) MY_YEAR
FROM
(SELECT
MAX(YEAR)-MIN(YEAR)+1 YEARS,
MIN(YEAR) FIRST_YEAR
FROM
T1),
DUAL
2000 A1 1
2001 A1 3
2002 A1
2000 A2 10
2001 A2
2002 A2 60
2000 B1 2
2001 B1 4
2002 B1 8
2000 B2 20
2001 B2
2002 B2
2000 C 50
2001 C
2002 C 7
The above looks ugly, and we are not done yet:
SELECT
MY_YEAR,
CATEGORY,
NVL(VALUE1,
NVL(LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY
MY_YEAR DESC),
LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
DESC)
)
) L_VALUE1,
VALUE1,
LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR DESC) P1_VALUE1, LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEARDESC) P2_VALUE1
(SELECT
MAX(YEAR)-MIN(YEAR)+1 YEARS,
MIN(YEAR) FIRST_YEAR
FROM
T1),
DUAL
2002 A1 3 3 1
2001 A1 3 3 1
2000 A1 1 1
2002 A2 60 60 10
2001 A2 10 10
2000 A2 10 10
2002 B1 8 8 4 2
2001 B1 4 4 2
2000 B1 2 2
2002 B2 20 20
2001 B2 20 20
2000 B2 20 20
2002 C 7 7 50
2001 C 50 50
2000 C 50 50
In the above, VALUE1 shows the value at the current year and category. P1_VALUE1 shows the value for the category in the previous year. P2_VALUE1 shows the value for the category two years previous to the row on the current row. L_VALUE1 is the result of searching through the VALUE1, and P1_VALUE1 columns looking for the first nonnull value - if both contain nulls, P2_VALUE1 is returned.
Now, let's return the answer:
SELECT
MY_YEAR,
SUM(DECODE(CATEGORY,'A1',L_VALUE1,'A2',L_VALUE1,0)) A_SUM, SUM(DECODE(CATEGORY,'B1',L_VALUE1,'B2',L_VALUE1,0)) B_SUM, SUM(DECODE(CATEGORY,'C',L_VALUE1,'C2',L_VALUE1,0)) C_SUMFROM
(SELECT
MAX(YEAR)-MIN(YEAR)+1 YEARS,
MIN(YEAR) FIRST_YEAR
FROM
T1),
DUAL
2000 11 22 50
2001 13 24 50
2002 63 28 7
Did I mention that I don't like this problem? There might be other ways to solve the problem.
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Sat Mar 03 2007 - 10:04:02 CST
![]() |
![]() |