Re: GROUP BY

From: David Cressey <cressey73_at_verizon.net>
Date: Fri, 18 May 2007 19:31:26 GMT
Message-ID: <ian3i.573$aj.19_at_trndny06>


"Marshall" <marshall.spight_at_gmail.com> wrote in message news:1179513166.858089.73940_at_k79g2000hse.googlegroups.com...
> GROUP BY works on a single relation. The input relation's attributes
> are partitioned into one of three categories:
>
> 1) those we ignore
> 2) those we group by
> 3) those we apply the function to
>
> The result relation will have attributes from 2) above, and in
> addition:
>
> 4) those created as a result of the applied function.
>
> The functions mentioned in 4) may only use as arguments the attributes
> listed in 3).
>
> So for example given R(a, b, c):
>
> select b, sum(c) as d from R group by b
>
> will ignore attribute(s) a, group by attribute(s) b, and apply
> a function to attribute(s) c, yielding the additional attribute d.
> The result will have attributes (b, d).
>
> One of my favorite lessons from TTM was always to consider the 0-arity
> case.
>
> It is not unusual to consider the case of 1) or 2) being empty.
> In the first case, we are either grouping by or aggregating over
> all attributes. In the second, we aren't grouping, but are just
> aggregating over the entire relation; the result will have
> a single row.
>
>
> What if 3) is empty?
>
> In that case, GROUP BY is equivalent to PROJECT!
>
> SQL obscures this fact, due to the templated nature of a SQL query.
> The 3)-empty case is:
>
> select b from R group by b
>
> which would just be simplified to:
>
> select b from R
>
> which is how you project in SQL already.

correction:

select distinct b from R

is how you project in SQL already.

I've had to make this correction for a lot of newbie SQL writers. A lot of newbie SQL writers learn to avoid "select distinct" because "it runs too slow".

>
>
> What if 4) uses not an aggregate function, but just a regular
> function?
>
> In that case, GROUP BY is equivalent to APPLY! (What TTM calls
> "COMPOSE".)
>
> Recall the example from TTM, from the section "Treating Operators as
> Relations" (ch. 4).
>
> TWO_AND_TWO <COMPOSE> PLUS
>
> You may immediately object that this introduces the necessity of
> duplicate elimination, but <COMPOSE> already had this requirement.
> (If you have a set of x values, it is not the case that every f(x)
> will be distinct, unless f is one-to-one.)
>
> I have been thinking about the "reverse" of aggregate functions,
> if you will: "functions" that take a single input but produce multiple
> outputs. They are called "generators" in some languages. (They are not
> strictly functions.) It has lately occurred to me that perhaps a
> single construct could contain all of project, apply, group by, and
> a mechanism for applying generators.
>
> To my shock, reviewing ch. 4 for this note, I found at the bottom
> of page 53, TTM says "One implication of this example is that a
> relational language such as D might reasonably include an extended
> form of EXTEND that--unlike the traditional EXTEND--is not
> necessarily limited to producing aexactly one output tuple from
> each input tuple."
>
> So it appears I am not the first to think something like this.
>
> Discussion welcome.
>
>
> Marshall
>
Received on Fri May 18 2007 - 21:31:26 CEST

Original text of this message