Re: Is ROLLUP really better than doing a nested query when rank is involved?

From: Chris Colclough <chris.colclough_at_nospam.jhuapl.edu>
Date: Tue, 4 Sep 2007 11:33:21 -0400
Message-ID: <fbjts3$7vj$1@aplnetnews.jhuapl.edu>

<csimam_at_gmail.com> wrote in message
news:1188875136.958875.206220_at_y42g2000hsy.googlegroups.com...
> Here is a thread I posted elsewhere that I wonder if someone here
> could shed some light on...
>
> The original question I had:
>
> --- BEGIN ---
> Which one is better?
> 1. To perform aggregate queries on data, and use that data to home in
> and perform more detailed analysis? This gives result sets that are
> potentially filled with columns that contain the aggregate data
> repeated across many rows.
> I meant here by homing in: within the same query - using
> nested SELECTS. For example, the inner query performs the aggregate
> analysis, and returns whatever data we are interested in, and then
> the
> outer query performs another SELECT on the same data, resulting in
> the
> aggregate data being distributed across 'detailed results' from the
> outer query.
>
> 2. To perform the aggregate query once, and then hit the database
> again for a detailed analysis, and return the detailed analysis
> results in an order that is easily cross-correlated with the original
> aggregate data? This means running the aggregate query essentially
> twice, just the second time it is hidden as a subquery.
> --- END ---
>
> I realize the question was vague, but a super DBA understood what I
> was asking and suggested I study up on ROLLUP and CUBE (I am not a DBA
> by the way). So I did... and here is what I found:
>
> --- BEGIN ---
> The data I have, I want to generate totals and subtotals, say of
> sales: total sales per store, and subtotal of sales per store per day
> for example.
>
> The problem is that I ultimately only want the top 25 stores in total
> sales BUT with their subtotals as well... So I used ROLLUP to
> generate
> the totals and subtotals... and in the same query I generated a
> ranking of the data... and there I found my problem: the ranking does
> not distinguish between totals and subtotals. So the results look
> like:
>
> store1, realTotal, 1
> store1, subTotal, 2
> store2, realTotal, 3
> store7, realTotal, 4
> store10, realTotal, 5
> store10, subTotal, 6
>
> Clearly, the rows with subtotals should NOT be ranked... but I am
> struggling with SQL to figure out how to get exactly what I want.
>
> Well... I have a solution. But I'm not sure about it. It involves
> something I learned about only recently: a user-defined aggregate
> function that concatenates values from across rows. Using this
> aggregate function, stragg, I can do another pass through the results
> mentioned above while grouping by store, and produce:
>
> store1, 'realTotal subTotal', 1
> store2, 'realTotal subTotal', 3
> store7, 'realTotal subTotal', 4
> store10, 'realTotal subTotal', 5
>
> And another trick: use min( rank ) in the same aggregation, and order
> by it as well.
>
> Together that gives one a very compact result set that has everything
> ranked and grouped perfectly.
>
> ...but... a worry: is this REALLY better than doing a nested query? I
> wonder if the use of the user-defined aggregate function is very
> expensive...
>
> For the benefit of anyone wondering, here's the structure that
> works... in one pass in theory... but again, I am now wondering if it
> really is faster than the nested query equivalent that hits the
> database twice on the table 'stores' (seems to me that the middle
> select below is effectively hitting the table again, just in
> temporary
> form):
>
> select rownum as rank, id, days, totals
> from
> (
> select id, stragg(day) as days, stragg(total) as totals,
> min( rank ) as innerrank
> from
> (
> select id, day, sum(sales) as total, rank() over (order by
> sum(sales) desc) rank
> from stores
> group by id, rollup(day)
> )
> group by id
> order by min(rank)
> )
> where rownum <= 25
>
> --- END ---
>
> Any comments? Is this query really better than the following:
>
> -- get totals by day
> select rank, id, day, total, sum(sales) as dailytotal
> from
> (
> select * from
> (
> -- get grand totals regardless of day
> select id as storeid, sum( sales ) as total,
> rank() over (order by sum(sales) desc) rank
> from stores
> group by id
> )
> where rank <= 25
> ), stores
> -- NOTE join with stores again
> where id = storeid
> -- but does the where condition above clip the query and make it
> efficient?
> group by rank, id, day, total
> order by rank, day
>

This *does not* directly answer your question, but with the following data:

SQL> select * from store;

STORE_ID TRANS_DATE TRANS_AMT
---------- ----------- ------------

         1 2007/09/01         10.00
         1 2007/09/02         20.00
         1 2007/09/03         30.00
         1 2007/09/04         40.00
         2 2007/09/01         50.00
         2 2007/09/02         60.00
         2 2007/09/03         70.00
         2 2007/09/04         80.00
         3 2007/09/01         90.00
         3 2007/09/02        100.00
         3 2007/09/03        110.00
         3 2007/09/04        120.00

12 rows selected

the grouping() operator allows various levels of totals to be distinguished from any other row in the result set.

For example:

SQL> select grouping(store_id), grouping(s.trans_date), store_id, s.trans_date, sum(trans_amt)
2 from store s
3 group by rollup (store_id, s.trans_date);

GROUPING(STORE_ID) GROUPING(S.TRANS_DATE) STORE_ID TRANS_DATE SUM(TRANS_AMT)

------------------ ---------------------- ---------- ----------- --------------
                 0                      0          1 2007/09/01 
10
                 0                      0          1 2007/09/02 
20
                 0                      0          1 2007/09/03 
30
                 0                      0          1 2007/09/04 
40
                 0                      1          1 
100
                 0                      0          2 2007/09/01 
50
                 0                      0          2 2007/09/02 
60
                 0                      0          2 2007/09/03 
70
                 0                      0          2 2007/09/04 
80
                 0                      1          2 
260
                 0                      0          3 2007/09/01 
90
                 0                      0          3 2007/09/02 
100
                 0                      0          3 2007/09/03 
110
                 0                      0          3 2007/09/04 
120
                 0                      1          3 
420
                 1                      1

780

16 rows selected

Will distinguish transation_date subtotals from store_id subtotals from grand totals.

This may provide the flexibility being sought. By using the above query in a subquery factoring clause, it is possible to query either the subtotal information or the detail information, or both in a single query.

hth Received on Tue Sep 04 2007 - 10:33:21 CDT