Re: COUNT(...) OVER(...)
From: Álvaro G. Vicario <alvaro.NOSPAMTHANX_at_demogracia.com>
Date: Fri, 26 Jun 2009 10:00:54 +0200
Message-ID: <h21vc1$ilh$1_at_news.eternal-september.org>
Peter Nilsson escribió:
>> I'm trying to learn about analytic functions, more specifically
>> the OVER() clause applied to COUNT() and AVG() functions. I've
>> read the SQL Reference and I've found several examples in Google
>> but I still can't grasp the concept.
>> What follows is a simplified example based upon a real life
>> scenario. I'll probably fix it by adding as many subqueries as
>> necessary so that's not an issue. But I'd like to know whether
>> it can be solved with analytic functions and eventually
>> understand how to use them.
>>
>> Data looks like this:
>>
>> SELECT
>> ro.route_id, ro.route_date,
>> di.invoice, di.customer_id,
>> cu.customer_type
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>>
>> route_id route_date invoice customer_id customer_type
>> ======== ========== ======= =========== =============
>> 1 2008-12-31 A10 1000 Type A
>> 1 2008-12-31 A11 1000 Type A
>>
>> 1 2008-12-31 A12 2000 Type B
>> 1 2008-12-31 A13 2000 Type B
>>
>> 1 2008-12-31 A14 3000 Type B
>>
>> 2 2008-12-31 A15 1000 Type A
>> 2 2008-12-31 A16 1000 Type A
>>
>> 2 2008-12-31 A17 3000 Type B
>>
>> 3 2009-01-01 A18 1000 Type A
>> ...
>>
>> I need to fetch several sums and averages from this data set.
>> Most of it is pretty easy:
>>
>> SELECT
>> cu.customer_type,
>> COUNT(DISTINCT ro.route_id) AS nr_of_routes,
>> COUNT(DISTINCT ro.route_date) AS nr_of_days,
>> COUNT(di.invoice) AS nr_of_invoices,
>> COUNT(DISTINCT di.customer_id) AS nr_of_customers,
>
> You have a spurious comma.
>> I also need to calculate the number of deliveries. In the
>> first table I separated the different deliveries with blank
>> lines; the figures should be:
>>
>> customer_type nr_of_deliveries
>> ============= ================
>> Type A 3
>> Type B 3
>>
>> The database design lacks a "deliveries" table: it links
>> routes with invoices and I'm not allowed to change that.
>>
>> I've determined that counting the number of different
>> customers per route is an acceptable approximation (a driver
>> never goes twice to the same address in the same route) and
>> I can calculate the number of deliveries for any given route
>> using a simple GROUP BY ro.route_id but I can't calculate it
>> by other criteria (for instance, when the main query needs
>> to be grouped by customer_type).
>>
>> I got the impression that I could do this:
>>
>> SELECT cu.customer_type,
>> COUNT(DISTINCT di.customer_id)
>> OVER(PARTITION BY ro.route_id)
>> AS nr_of_deliveries,
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>> GROUP BY cu.customer_type
>>
>> But I get an error unless I add ro.route_id to the main GROUP
>> BY clause which, of course, beats the purpose of the query.
>>
>> What's wrong in my assumption?
Date: Fri, 26 Jun 2009 10:00:54 +0200
Message-ID: <h21vc1$ilh$1_at_news.eternal-september.org>
Peter Nilsson escribió:
> On Jun 25, 9:06 pm, "Álvaro G. Vicario" > <alvaro.NOSPAMTH..._at_demogracia.com> wrote:
>> I'm trying to learn about analytic functions, more specifically
>> the OVER() clause applied to COUNT() and AVG() functions. I've
>> read the SQL Reference and I've found several examples in Google
>> but I still can't grasp the concept.
> > Sounds like a separate issue. >
>> What follows is a simplified example based upon a real life
>> scenario. I'll probably fix it by adding as many subqueries as
>> necessary so that's not an issue. But I'd like to know whether
>> it can be solved with analytic functions and eventually
>> understand how to use them.
>>
>> Data looks like this:
>>
>> SELECT
>> ro.route_id, ro.route_date,
>> di.invoice, di.customer_id,
>> cu.customer_type
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>>
>> route_id route_date invoice customer_id customer_type
>> ======== ========== ======= =========== =============
>> 1 2008-12-31 A10 1000 Type A
>> 1 2008-12-31 A11 1000 Type A
>>
>> 1 2008-12-31 A12 2000 Type B
>> 1 2008-12-31 A13 2000 Type B
>>
>> 1 2008-12-31 A14 3000 Type B
>>
>> 2 2008-12-31 A15 1000 Type A
>> 2 2008-12-31 A16 1000 Type A
>>
>> 2 2008-12-31 A17 3000 Type B
>>
>> 3 2009-01-01 A18 1000 Type A
>> ...
>>
>> I need to fetch several sums and averages from this data set.
>> Most of it is pretty easy:
>>
>> SELECT
>> cu.customer_type,
>> COUNT(DISTINCT ro.route_id) AS nr_of_routes,
>> COUNT(DISTINCT ro.route_date) AS nr_of_days,
>> COUNT(di.invoice) AS nr_of_invoices,
>> COUNT(DISTINCT di.customer_id) AS nr_of_customers,
>
> You have a spurious comma.
It's a simplified example. It's not even actual code.
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>> GROUP BY cu.customer_type
> <snip>
>> I also need to calculate the number of deliveries. In the
>> first table I separated the different deliveries with blank
>> lines; the figures should be:
>>
>> customer_type nr_of_deliveries
>> ============= ================
>> Type A 3
>> Type B 3
>>
>> The database design lacks a "deliveries" table: it links
>> routes with invoices and I'm not allowed to change that.
>>
>> I've determined that counting the number of different
>> customers per route is an acceptable approximation (a driver
>> never goes twice to the same address in the same route) and
>> I can calculate the number of deliveries for any given route
>> using a simple GROUP BY ro.route_id but I can't calculate it
>> by other criteria (for instance, when the main query needs
>> to be grouped by customer_type).
>>
>> I got the impression that I could do this:
>>
>> SELECT cu.customer_type,
>> COUNT(DISTINCT di.customer_id)
>> OVER(PARTITION BY ro.route_id)
>> AS nr_of_deliveries,
> > Again, you have a spare comma. >
>> FROM route ro
>> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
>> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>> GROUP BY cu.customer_type
>>
>> But I get an error unless I add ro.route_id to the main GROUP
>> BY clause which, of course, beats the purpose of the query.
>>
>> What's wrong in my assumption?
> > You can't group over windowing functions any more than you > group over a group without nested queries.
Do you mean that it's basically an alternative syntax for GROUP BY?
> You're really tring to do a count distinct count over > multiple(2) columns. The query to get your delivery results > is... > > select cu.customer_type, count(*) > from (select distinct route_id, customer_id > from delivered_invoices) di > join customers cu on di.customer_id=cu.customer_id > group by cu.customer_type > > You need to choose only 1 row per route_id, customer_id > pair. You can use a conditional sum to get your count... > > select cu.customer_type, > count(distinct ro.route_id) as nr_of_routes, > count(distinct ro.route_date) as nr_of_days, > count(di.invoice) as nr_of_invoices, > count(distinct di.customer_id) as nr_of_customers, > sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries > from route ro > join (select route_id, customer_id, invoice, > row_number() over > (partition by route_id, customer_id > order by null) rn > from delivered_invoices) di on ro.route_id=di.route_id > join customers cu on di.customer_id=cu.customer_id > group by cu.customer_type
Actually, I a did a regular "SELECT COUNT() ... GROUP BY" subquery. But, as I said, I'm not trying to fix a problem but, rather than that, trying to understand how COUNT() and OVER() work.
> > You haven't shown any DLL. The above assumes all routes are > completed on the same day. [Which isn't true in all real life > scenarious.] If that's not the case, then you'll need to tweek > the query.
It can be done by adding route_date to the SELECT and GROUP BY clauses of the subquery and the ON condition of the join.
-- -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain -- Mi sitio sobre programación web: http://borrame.com -- Mi web de humor satinado: http://www.demogracia.com --Received on Fri Jun 26 2009 - 03:00:54 CDT