Re: GROUP BY

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Fri, 18 May 2007 15:48:34 -0300
Message-ID: <464df4bb$0$4026$9a566e8b_at_news.aliant.net>


Marshall wrote:

> GROUP BY works on a single relation. The input relation's attributes
> are partitioned into one of three categories:
>
> 1) those we ignore
> 2) those we group by
> 3) those we apply the function to
>
> The result relation will have attributes from 2) above, and in
> addition:
>
> 4) those created as a result of the applied function.
>
> The functions mentioned in 4) may only use as arguments the attributes
> listed in 3).
>
> So for example given R(a, b, c):
>
> select b, sum(c) as d from R group by b

What exactly is your objection to:

select b, b * sum(c) as d from R group by b

> will ignore attribute(s) a, group by attribute(s) b, and apply
> a function to attribute(s) c, yielding the additional attribute d.

Whether the function applies to b or c or some combination depends on the full expression.

select b1, b2, b1+b2, b1*sum(c) as d from R group by b1, b2

> The result will have attributes (b, d).
>
> One of my favorite lessons from TTM was always to consider the 0-arity
> case.
>
> It is not unusual to consider the case of 1) or 2) being empty.
> In the first case, we are either grouping by or aggregating over
> all attributes. In the second, we aren't grouping, but are just
> aggregating over the entire relation; the result will have
> a single row.
>
> What if 3) is empty?
>
> In that case, GROUP BY is equivalent to PROJECT!
>
> SQL obscures this fact, due to the templated nature of a SQL query.
> The 3)-empty case is:
>
> select b from R group by b
>
> which would just be simplified to:
>
> select b from R
>
> which is how you project in SQL already.

Except that SQL requires the distinct or group by clauses for project.

> What if 4) uses not an aggregate function, but just a regular
> function?
>
> In that case, GROUP BY is equivalent to APPLY! (What TTM calls
> "COMPOSE".)
>
> Recall the example from TTM, from the section "Treating Operators as
> Relations" (ch. 4).
>
> TWO_AND_TWO <COMPOSE> PLUS
>
> You may immediately object that this introduces the necessity of
> duplicate elimination, but <COMPOSE> already had this requirement.
> (If you have a set of x values, it is not the case that every f(x)
> will be distinct, unless f is one-to-one.)
>
> I have been thinking about the "reverse" of aggregate functions,
> if you will: "functions" that take a single input but produce multiple
> outputs. They are called "generators" in some languages. (They are not
> strictly functions.) It has lately occurred to me that perhaps a
> single construct could contain all of project, apply, group by, and
> a mechanism for applying generators.
>
> To my shock, reviewing ch. 4 for this note, I found at the bottom
> of page 53, TTM says "One implication of this example is that a
> relational language such as D might reasonably include an extended
> form of EXTEND that--unlike the traditional EXTEND--is not
> necessarily limited to producing aexactly one output tuple from
> each input tuple."
>
> So it appears I am not the first to think something like this.
>
> Discussion welcome.

Dealing with RVA's necessitates such an extend. Received on Fri May 18 2007 - 20:48:34 CEST

Original text of this message