Re: COUNT(...) OVER(...)

From: Álvaro G. Vicario <alvaro.NOSPAMTHANX_at_demogracia.com>
Date: Fri, 26 Jun 2009 10:00:54 +0200
Message-ID: <h21vc1$ilh$1_at_news.eternal-september.org>



Peter Nilsson escribió:
> On Jun 25, 9:06 pm, "Álvaro G. Vicario"
> <alvaro.NOSPAMTH..._at_demogracia.com> wrote:

>> I'm trying to learn about analytic functions, more specifically
>> the OVER() clause applied to COUNT() and AVG() functions. I've
>> read the SQL Reference and I've found several examples in Google
>> but I still can't grasp the concept.
> 
> Sounds like a separate issue.
> 

>> What follows is a simplified example based upon a real life
>> scenario. I'll probably fix it by adding as many subqueries as
>> necessary so that's not an issue. But I'd like to know whether
>> it can be solved with analytic functions and eventually
>> understand how to use them.
>>
>> Data looks like this:
>>
>> SELECT
>> ro.route_id, ro.route_date,
>> di.invoice, di.customer_id,
>> cu.customer_type
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>>
>> route_id route_date invoice customer_id customer_type
>> ======== ========== ======= =========== =============
>> 1 2008-12-31 A10 1000 Type A
>> 1 2008-12-31 A11 1000 Type A
>>
>> 1 2008-12-31 A12 2000 Type B
>> 1 2008-12-31 A13 2000 Type B
>>
>> 1 2008-12-31 A14 3000 Type B
>>
>> 2 2008-12-31 A15 1000 Type A
>> 2 2008-12-31 A16 1000 Type A
>>
>> 2 2008-12-31 A17 3000 Type B
>>
>> 3 2009-01-01 A18 1000 Type A
>> ...
>>
>> I need to fetch several sums and averages from this data set.
>> Most of it is pretty easy:
>>
>> SELECT
>> cu.customer_type,
>> COUNT(DISTINCT ro.route_id) AS nr_of_routes,
>> COUNT(DISTINCT ro.route_date) AS nr_of_days,
>> COUNT(di.invoice) AS nr_of_invoices,
>> COUNT(DISTINCT di.customer_id) AS nr_of_customers,
>
> You have a spurious comma.

It's a simplified example. It's not even actual code.

>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>> GROUP BY cu.customer_type

> <snip>

>> I also need to calculate the number of deliveries. In the
>> first table I separated the different deliveries with blank
>> lines; the figures should be:
>>
>> customer_type nr_of_deliveries
>> ============= ================
>> Type A 3
>> Type B 3
>>
>> The database design lacks a "deliveries" table: it links
>> routes with invoices and I'm not allowed to change that.
>>
>> I've determined that counting the number of different
>> customers per route is an acceptable approximation (a driver
>> never goes twice to the same address in the same route) and
>> I can calculate the number of deliveries for any given route
>> using a simple GROUP BY ro.route_id but I can't calculate it
>> by other criteria (for instance, when the main query needs
>> to be grouped by customer_type).
>>
>> I got the impression that I could do this:
>>
>> SELECT cu.customer_type,
>> COUNT(DISTINCT di.customer_id)
>> OVER(PARTITION BY ro.route_id)
>> AS nr_of_deliveries,
> 
> Again, you have a spare comma.
> 

>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>> GROUP BY cu.customer_type
>>
>> But I get an error unless I add ro.route_id to the main GROUP
>> BY clause which, of course, beats the purpose of the query.
>>
>> What's wrong in my assumption?
> 
> You can't group over windowing functions any more than you
> group over a group without nested queries.

Do you mean that it's basically an alternative syntax for GROUP BY?

> You're really tring to do a count distinct count over
> multiple(2) columns. The query to get your delivery results
> is...
> 
>   select cu.customer_type, count(*)
>     from (select distinct route_id, customer_id
>             from delivered_invoices) di
>     join customers cu on di.customer_id=cu.customer_id
>    group by cu.customer_type
> 
> You need to choose only 1 row per route_id, customer_id
> pair. You can use a conditional sum to get your count...
> 
>   select cu.customer_type,
>          count(distinct ro.route_id) as nr_of_routes,
>          count(distinct ro.route_date) as nr_of_days,
>          count(di.invoice) as nr_of_invoices,
>          count(distinct di.customer_id) as nr_of_customers,
>          sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries
>     from route ro
>     join (select route_id, customer_id, invoice,
>                  row_number() over
>                    (partition by route_id, customer_id
>                         order by null) rn
>             from delivered_invoices) di on ro.route_id=di.route_id
>     join customers cu on di.customer_id=cu.customer_id
>    group by cu.customer_type

Actually, I a did a regular "SELECT COUNT() ... GROUP BY" subquery. But, as I said, I'm not trying to fix a problem but, rather than that, trying to understand how COUNT() and OVER() work.

> 
> You haven't shown any DLL. The above assumes all routes are
> completed on the same day. [Which isn't true in all real life
> scenarious.] If that's not the case, then you'll need to tweek
> the query.

It can be done by adding route_date to the SELECT and GROUP BY clauses of the subquery and the ON condition of the join.

-- 
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--
Received on Fri Jun 26 2009 - 03:00:54 CDT

Original text of this message