Re: COUNT(...) OVER(...)

From: Peter Nilsson <airia_at_acay.com.au>
Date: Thu, 25 Jun 2009 16:01:45 -0700 (PDT)
Message-ID: <e9ede590-1471-470d-9b3f-236cadbd2fb5_at_y28g2000prd.googlegroups.com>



On Jun 25, 9:06 pm, "Álvaro G. Vicario"
<alvaro.NOSPAMTH..._at_demogracia.com> wrote:
> I'm trying to learn about analytic functions, more specifically
> the OVER() clause applied to COUNT() and AVG() functions. I've
> read the SQL Reference and I've found several examples in Google
> but I still can't grasp the concept.

Sounds like a separate issue.

> What follows is a simplified example based upon a real life
> scenario. I'll probably fix it by adding as many subqueries as
> necessary so that's not an issue. But I'd like to know whether
> it can be solved with analytic functions and eventually
> understand how to use them.
>
> Data looks like this:
>
> SELECT
> ro.route_id, ro.route_date,
> di.invoice, di.customer_id,
> cu.customer_type
> FROM route ro
> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
> INNER JOIN customers cu ON di.customer_id=cu.customer_id
>
> route_id route_date invoice customer_id customer_type
> ======== ========== ======= =========== =============
> 1 2008-12-31 A10 1000 Type A
> 1 2008-12-31 A11 1000 Type A
>
> 1 2008-12-31 A12 2000 Type B
> 1 2008-12-31 A13 2000 Type B
>
> 1 2008-12-31 A14 3000 Type B
>
> 2 2008-12-31 A15 1000 Type A
> 2 2008-12-31 A16 1000 Type A
>
> 2 2008-12-31 A17 3000 Type B
>
> 3 2009-01-01 A18 1000 Type A
> ...
>
> I need to fetch several sums and averages from this data set.
> Most of it is pretty easy:
>
> SELECT
> cu.customer_type,
> COUNT(DISTINCT ro.route_id) AS nr_of_routes,
> COUNT(DISTINCT ro.route_date) AS nr_of_days,
> COUNT(di.invoice) AS nr_of_invoices,
> COUNT(DISTINCT di.customer_id) AS nr_of_customers,

You have a spurious comma.

> FROM route ro
> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
> INNER JOIN customers cu ON di.customer_id=cu.customer_id
> GROUP BY cu.customer_type

<snip>
> I also need to calculate the number of deliveries. In the
> first table I separated the different deliveries with blank
> lines; the figures should be:
>
> customer_type nr_of_deliveries
> ============= ================
> Type A 3
> Type B 3
>
> The database design lacks a "deliveries" table: it links
> routes with invoices and I'm not allowed to change that.
>
> I've determined that counting the number of different
> customers per route is an acceptable approximation (a driver
> never goes twice to the same address in the same route) and
> I can calculate the number of deliveries for any given route
> using a simple GROUP BY ro.route_id but I can't calculate it
> by other criteria (for instance, when the main query needs
> to be grouped by customer_type).
>
> I got the impression that I could do this:
>
> SELECT cu.customer_type,
> COUNT(DISTINCT di.customer_id)
> OVER(PARTITION BY ro.route_id)
> AS nr_of_deliveries,

Again, you have a spare comma.

> FROM route ro
> INNER JOIN delivered_invoices di ON ro.route_id=di.route_id
> INNER JOIN customers cu ON di.customer_id=cu.customer_id
> GROUP BY cu.customer_type
>
> But I get an error unless I add ro.route_id to the main GROUP
> BY clause which, of course, beats the purpose of the query.
>
> What's wrong in my assumption?

You can't group over windowing functions any more than you group over a group without nested queries.

You're really tring to do a count distinct count over multiple(2) columns. The query to get your delivery results is...

  select cu.customer_type, count(*)
    from (select distinct route_id, customer_id

            from delivered_invoices) di
    join customers cu on di.customer_id=cu.customer_id    group by cu.customer_type

You need to choose only 1 row per route_id, customer_id pair. You can use a conditional sum to get your count...

  select cu.customer_type,

         count(distinct ro.route_id) as nr_of_routes,
         count(distinct ro.route_date) as nr_of_days,
         count(di.invoice) as nr_of_invoices,
         count(distinct di.customer_id) as nr_of_customers,
         sum(decode(di.rn, 1, 1, 0)) nr_of_deliveries
    from route ro
    join (select route_id, customer_id, invoice,
                 row_number() over
                   (partition by route_id, customer_id
                        order by null) rn
            from delivered_invoices) di on ro.route_id=di.route_id
    join customers cu on di.customer_id=cu.customer_id    group by cu.customer_type

You haven't shown any DLL. The above assumes all routes are completed on the same day. [Which isn't true in all real life scenarious.] If that's not the case, then you'll need to tweek the query.

--
Peter
Received on Thu Jun 25 2009 - 18:01:45 CDT

Original text of this message