COUNT(...) OVER(...)

From: Álvaro G. Vicario <alvaro.NOSPAMTHANX_at_demogracia.com>
Date: Thu, 25 Jun 2009 13:06:00 +0200
Message-ID: <h1vlqo$6r3$1_at_news.eternal-september.org>



I'm trying to learn about analytic functions, more specifically the OVER() clause applied to COUNT() and AVG() functions. I've read the SQL Reference and I've found several examples in Google but I still can't grasp the concept.

What follows is a simplified example based upon a real life scenario. I'll probably fix it by adding as many subqueries as necessary so that's not an issue. But I'd like to know whether it can be solved with analytic functions and eventually understand how to use them.

Data looks like this:

SELECT

	ro.route_id, ro.route_date,
	di.invoice, di.customer_id,
	cu.customer_type

FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id

route_id route_date invoice customer_id customer_type ======== ========== ======= =========== =============

        1 2008-12-31 A10            1000 Type A
        1 2008-12-31 A11            1000 Type A

        1 2008-12-31 A12            2000 Type B
        1 2008-12-31 A13            2000 Type B

        1 2008-12-31 A14            3000 Type B

        2 2008-12-31 A15            1000 Type A
        2 2008-12-31 A16            1000 Type A

        2 2008-12-31 A17            3000 Type B

        3 2009-01-01 A18            1000 Type A
...

I need to fetch several sums and averages from this data set. Most of it is pretty easy:

SELECT

	cu.customer_type,
	COUNT(DISTINCT ro.route_id) AS nr_of_routes,
	COUNT(DISTINCT ro.route_date) AS nr_of_days,
	COUNT(di.invoice) AS nr_of_invoices,
	COUNT(DISTINCT di.customer_id) AS nr_of_customers,
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type

customer_type nr_of_routes nr_of_days nr_of_invoices nr_of_customers ============= ============ ========== ============== ===============

Type A                   3          2              5               1
Type B                   2          1              4               2

I also need to calculate the number of deliveries. In the first table I separated the different deliveries with blank lines; the figures should be:

customer_type nr_of_deliveries

============= ================
Type A                       3
Type B                       3

The database design lacks a "deliveries" table: it links routes with invoices and I'm not allowed to change that.

I've determined that counting the number of different customers per route is an acceptable approximation (a driver never goes twice to the same address in the same route) and I can calculate the number of deliveries for any given route using a simple GROUP BY ro.route_id but I can't calculate it by other criteria (for instance, when the main query needs to be grouped by customer_type).

I got the impression that I could do this:

SELECT

	cu.customer_type,
	COUNT(DISTINCT di.customer_id) OVER(PARTITION BY ro.route_id) AS 
nr_of_deliveries,
FROM route ro
INNER JOIN delivered_invoices di ON ro.route_id=di.route_id INNER JOIN customers cu ON di.customer_id=cu.customer_id GROUP BY cu.customer_type

But I get an error unless I add ro.route_id to the main GROUP BY clause which, of course, beats the purpose of the query.

What's wrong in my assumption?

P.S. For what it matters, I'm using Oracle Database 10g Enterprise Edition Release 10.1.0.2.

--

Received on Thu Jun 25 2009 - 06:06:00 CDT

Original text of this message