Charles Hooper schrieb:
> On Mar 2, 10:07 pm, y..._at_vtn1.victoria.tc.ca (Malcolm Dew-Jones)
> wrote:
>> Hello
>>
>> I am looking for some insight on solving this problem using analytic
>> functions.
>>
>> I already have another, non-analytic, solution, but I am not happy with it
>> because it seems overly complicated. As a procedure I am sure I could
>> just scan once thru the sorted data, so it seems "obvious" that I should
>> be able to find an easy elegant solution to this in SQL, and I think that
>> analytic functions would do that, but I just can't think how to use them
>> here.
>>
>> I have a table like this (simplified version shown here).
>>
>> year category value1
>> ---- -------- ------
>>
>> 2000 A1 1
>> 2000 A2 10
>> 2000 B1 2
>> 2000 B2 20
>> 2000 C 50
>>
>> 2001 A1 3
>> 2001 B1 4
>>
>> 2002 A2 60
>> 2002 B1 8
>> 2002 C 7
>>
>> I show a kind of rollup with three categories, each being the sum of the
>> two most recently available values for each year. A key issue being that
>> the value for a year may be missing and so you need to use the most recent
>> value from the past. A complication is that the final value for a year is
>> actually the sum of two values, each of which may be from different
>> previous years. (One column is not a sum so I show that just in case it
>> makes a difference.)
>>
>> (The "+" is just to illustrate the numbers involved, the actual value is
>> the mathematic sum of the numbers.)
>>
>> year sum(a1+a2) sum(b1+b2) C
>> ---- ---------- ---------- -
>> 2000 1+10 2+20 50
>> 2001 3+10 4+20 50
>> 2002 3+60 8+20 7
>>
>> Right now I use a partial cartesian join and a not-exists to provide an
>> intermediate inline view that "fills in" the missing values. The final
>> result is then just a simple group-by+decode of that intermediate view.
>>
>> This just a simplified example of the real thing.
>>
>> select year
>> , sum( CASE WHEN category IN ('A1','A2')
>> THEN value1 ELSE NULL END ) AS sum_A
>> , sum( CASE WHEN category IN ('B1','B2')
>> THEN value1 ELSE NULL END ) AS sum_B
>> , sum( CASE WHEN category IN ('C' )
>> THEN value1 ELSE NULL END ) AS the_C
>> from (select year,category,value1 from big ugly join) my_intermediate_view
>> group by year
>>
>> I think I should be able to do the whole thing in the one select
>> statement - if I figure out how to correctly use the correct set of
>> analytic functions to provide the most recent values for the pairs of
>> values inside each sum.
>>
>> Can anyone provide insight on how to do this with analytic functions?
>>
>> Thanks for any feedback.
>
> Let me start by saying that I don't like this problem. The solution
> below will require multiple Cartesian joins, and still is not a
> complete solution.
>
> The setup:
> CREATE TABLE T1 (
> YEAR NUMBER(4),
> CATEGORY VARCHAR2(2),
> VALUE1 NUMBER(12));
>
> INSERT INTO T1 VALUES (2000,'A1',1);
> INSERT INTO T1 VALUES (2000,'A2',10);
> INSERT INTO T1 VALUES (2000,'B1',2);
> INSERT INTO T1 VALUES (2000,'B2',20);
> INSERT INTO T1 VALUES (2000,'C',50);
>
> INSERT INTO T1 VALUES (2001,'A1',3);
> INSERT INTO T1 VALUES (2001,'B1',4);
>
> INSERT INTO T1 VALUES (2002,'A2',60);
> INSERT INTO T1 VALUES (2002,'B1',8);
> INSERT INTO T1 VALUES (2002,'C',7);
>
> COMMIT;
>
> First, let's generate a list of all years between the first year and
> the last year in the data set:
> SELECT
> FIRST_YEAR+(ROWNUM-1) MY_YEAR
> FROM
> (SELECT
> MAX(YEAR)-MIN(YEAR)+1 YEARS,
> MIN(YEAR) FIRST_YEAR
> FROM
> T1),
> DUAL
> CONNECT BY ROWNUM<=YEARS;
>
> MY_YEAR
> ----------
> 2000
> 2001
> 2002
>
> We cannot use DISTINCT to list the years, because there could very
> well be years that are entirely missing from the data set.
>
> Now, we need a Cartesian join between all of the possible years, and
> all of the distinct values of CATEGORY:
> SELECT
> YRS.MY_YEAR,
> CAT.CATEGORY
> FROM
> (SELECT
> FIRST_YEAR+(ROWNUM-1) MY_YEAR
> FROM
> (SELECT
> MAX(YEAR)-MIN(YEAR)+1 YEARS,
> MIN(YEAR) FIRST_YEAR
> FROM
> T1),
> DUAL
> CONNECT BY ROWNUM<=YEARS) YRS,
> (SELECT DISTINCT
> CATEGORY
> FROM
> T1) CAT
> ORDER BY
> CAT.CATEGORY,
> YRS.MY_YEAR;
>
> MY_YEAR CA
> ---------- --
> 2000 A1
> 2001 A1
> 2002 A1
> 2000 A2
> 2001 A2
> 2002 A2
> 2000 B1
> 2001 B1
> 2002 B1
> 2000 B2
> 2001 B2
> 2002 B2
> 2000 C
> 2001 C
> 2002 C
>
> Now that we have a list of all possible combinations, we can start
> joining in the original data set.
> SELECT
> C.MY_YEAR,
> C.CATEGORY,
> T1.VALUE1
> FROM
> (SELECT
> YRS.MY_YEAR,
> CAT.CATEGORY
> FROM
> (SELECT
> FIRST_YEAR+(ROWNUM-1) MY_YEAR
> FROM
> (SELECT
> MAX(YEAR)-MIN(YEAR)+1 YEARS,
> MIN(YEAR) FIRST_YEAR
> FROM
> T1),
> DUAL
> CONNECT BY ROWNUM<=YEARS) YRS,
> (SELECT DISTINCT
> CATEGORY
> FROM
> T1) CAT) C,
> T1
> WHERE
> C.MY_YEAR=T1.YEAR(+)
> AND C.CATEGORY=T1.CATEGORY(+)
> ORDER BY
> C.CATEGORY,
> C.MY_YEAR;
>
> MY_YEAR CA VALUE1
> ---------- -- ----------
> 2000 A1 1
> 2001 A1 3
> 2002 A1
> 2000 A2 10
> 2001 A2
> 2002 A2 60
> 2000 B1 2
> 2001 B1 4
> 2002 B1 8
> 2000 B2 20
> 2001 B2
> 2002 B2
> 2000 C 50
> 2001 C
> 2002 C 7
>
> The above looks ugly, and we are not done yet:
> SELECT
> MY_YEAR,
> CATEGORY,
> NVL(VALUE1,
> NVL(LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY
> MY_YEAR DESC),
> LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
> DESC)
> )
> ) L_VALUE1,
> VALUE1,
> LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
> DESC) P1_VALUE1,
> LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY MY_YEAR
> DESC) P2_VALUE1
> FROM
> (SELECT
> C.MY_YEAR,
> C.CATEGORY,
> T1.VALUE1
> FROM
> (SELECT
> YRS.MY_YEAR,
> CAT.CATEGORY
> FROM
> (SELECT
> FIRST_YEAR+(ROWNUM-1) MY_YEAR
> FROM
> (SELECT
> MAX(YEAR)-MIN(YEAR)+1 YEARS,
> MIN(YEAR) FIRST_YEAR
> FROM
> T1),
> DUAL
> CONNECT BY ROWNUM<=YEARS) YRS,
> (SELECT DISTINCT
> CATEGORY
> FROM
> T1) CAT) C,
> T1
> WHERE
> C.MY_YEAR=T1.YEAR(+)
> AND C.CATEGORY=T1.CATEGORY(+));
>
> MY_YEAR CA L_VALUE1 VALUE1 P1_VALUE1 P2_VALUE1
> ---------- -- ---------- ---------- ---------- ----------
> 2002 A1 3 3 1
> 2001 A1 3 3 1
> 2000 A1 1 1
> 2002 A2 60 60 10
> 2001 A2 10 10
> 2000 A2 10 10
> 2002 B1 8 8 4 2
> 2001 B1 4 4 2
> 2000 B1 2 2
> 2002 B2 20 20
> 2001 B2 20 20
> 2000 B2 20 20
> 2002 C 7 7 50
> 2001 C 50 50
> 2000 C 50 50
>
> In the above, VALUE1 shows the value at the current year and
> category. P1_VALUE1 shows the value for the category in the previous
> year. P2_VALUE1 shows the value for the category two years previous
> to the row on the current row. L_VALUE1 is the result of searching
> through the VALUE1, and P1_VALUE1 columns looking for the first non-
> null value - if both contain nulls, P2_VALUE1 is returned.
>
> Now, let's return the answer:
> SELECT
> MY_YEAR,
> SUM(DECODE(CATEGORY,'A1',L_VALUE1,'A2',L_VALUE1,0)) A_SUM,
> SUM(DECODE(CATEGORY,'B1',L_VALUE1,'B2',L_VALUE1,0)) B_SUM,
> SUM(DECODE(CATEGORY,'C',L_VALUE1,'C2',L_VALUE1,0)) C_SUM
> FROM
> (SELECT
> MY_YEAR,
> CATEGORY,
> NVL(VALUE1,
> NVL(LEAD(VALUE1,1,NULL) OVER (PARTITION BY CATEGORY ORDER BY
> MY_YEAR DESC),
> NVL(LEAD(VALUE1,2,NULL) OVER (PARTITION BY CATEGORY ORDER BY
> MY_YEAR DESC),0)
> )
> ) L_VALUE1
> FROM
> (SELECT
> C.MY_YEAR,
> C.CATEGORY,
> T1.VALUE1
> FROM
> (SELECT
> YRS.MY_YEAR,
> CAT.CATEGORY
> FROM
> (SELECT
> FIRST_YEAR+(ROWNUM-1) MY_YEAR
> FROM
> (SELECT
> MAX(YEAR)-MIN(YEAR)+1 YEARS,
> MIN(YEAR) FIRST_YEAR
> FROM
> T1),
> DUAL
> CONNECT BY ROWNUM<=YEARS) YRS,
> (SELECT DISTINCT
> CATEGORY
> FROM
> T1) CAT) C,
> T1
> WHERE
> C.MY_YEAR=T1.YEAR(+)
> AND C.CATEGORY=T1.CATEGORY(+)))
> GROUP BY
> MY_YEAR
> ORDER BY
> MY_YEAR;
>
> MY_YEAR A_SUM B_SUM C_SUM
> ---------- ---------- ---------- ----------
> 2000 11 22 50
> 2001 13 24 50
> 2002 63 28 7
>
> Did I mention that I don't like this problem? There might be other
> ways to solve the problem.
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.
>
I don't think, this is an "uncomfotable" problem. It seems to me rather
to be typical problem regarding data densification ( there are really
good examples in data warehousing guide -
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14223/analysis.htm#sthref1840).
My query may be influenced by "magic of input data" - in cases, if all
categories are not available for certain year and still should be
calculated for the same year, one need a join with row generator to
create missing years, as you have shown ( in my query i silly ignore
such possibility, as i'm not sure, such requirement exists, on top - i
arrogantly stole the T1 table which you created ;-).
SQL> with t2 as(
2 select distinct
3 year,
4 last_value(decode(category,'A1',value1,null) ignore nulls)
over(order by year) a1,
5 last_value(decode(category,'A2',value1,null) ignore nulls)
over(order by year) a2,
6 last_value(decode(category,'B1',value1,null) ignore nulls)
over(order by year) b1,
7 last_value(decode(category,'B2',value1,null) ignore nulls)
over(order by year) b2,
8 last_value(decode(category,'C',value1,null) ignore nulls)
over(order by year) c
9 from t1)
10 select
11 year,
12 a1+a2,
13 b1+b2,
14 c
15 from t2
16 order by year;
YEAR A1+A2 B1+B2 C
---------- ---------- ---------- ----------
2000 11 22 50
2001 13 24 50
2002 63 28 7
Execution Plan
Plan hash value: 1616948475
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 780 | 5 (40)|
00:00:01 |
| 1 | SORT ORDER BY | | 10 | 780 | 5 (40)|
00:00:01 |
| 2 | VIEW | | 10 | 780 | 5 (40)|
00:00:01 |
| 3 | HASH UNIQUE | | 10 | 290 | 5 (40)|
00:00:01 |
| 4 | WINDOW SORT | | 10 | 290 | 5 (40)|
00:00:01 |
| 5 | TABLE ACCESS FULL| T1 | 10 | 290 | 3 (0)|
00:00:01 |
Best regards
Maxim
Received on Sat Mar 03 2007 - 14:05:59 CST