Re: GROUP BY

From: Marshall <marshall.spight_at_gmail.com>
Date: 19 May 2007 11:05:32 -0700
Message-ID: <1179597932.006120.282290_at_q75g2000hsh.googlegroups.com>


On May 19, 7:38 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:
> "Marshall" <marshall.spi..._at_gmail.com> wrote in message
>
> > So: what *about* that extended extend?
>
> > Extend is a particular form of join, yes? So perhaps we
> > ought to apply generators with join?
>
> > I note that there is an issue with regards to the uniqueness
> > of values produced by a generator. It's not clear to me if
> > that ought to be required or not. If it's required, it's not clear
> > to me if it's enforceable, since I don't see how to prove uniqueness.
>
> > If the outputs of the generator aren't unique, then we have
> > a function from a tuple to a bag, which also matches the
> > fact that aggregate functions are functions from a bag to
> > a tuple.
>
> I don't agree with this. An aggregate function iterates over a relation,
> taking a projection of each tuple to find the values required to calculate
> the result.

Well, that's not really incompatible with what I wrote. You've given an operational description; mine is more ... mathematical? abstract?

Yours is maybe a bit narrow, though, because by being so specific about the algorithm to implement aggregation, you've eliminated the possibility for parallelization.

> While it is convenient to visualize the collection of values
> for an attribute in a relation as a bag, it should not be ignored that any
> particular combination of values in that bag depends solely on the existence
> of a particular *set* of tuples.

Um, I don't see how. Anyway, we can certainly consider sum() in the abstract. It is a function taking a bag of numbers to a single number.
In some languages it is considered to take a list, but that too is
overspecifying, since order is not significant.

> > Ex: a function n_ones(n), which takes a natural number and
> > generates the number 1 that many times. This is a subset of
> > the inverse of sum().
>
> > n_ones(3) = {| 1, 1, 1 |}
>
> > sum( {| 1, 1, 1 |} = 3
>
> > So generators are the inverses of aggregates, and join is the inverse
> > of group by, and generators are applied with join and aggregates
> > are applied with group-by.
>
> Frankly, I don't see how there can even be an inverse for an aggregate.
> When you aggregate many distinct values into one, information is lost. How
> can you possibly reverse the process without that information? For example,
> if you have 30 students in a class where 'C' is the average grade, how can
> you tell who aced the course or who failed?

The situation is no different with proper functions. Some are invertible, some
are not. Any bijection is invertible. The function f(x) = 5 is not.

Some aggregates are invertible. n_ones() is invertible, don't you think?

Marshall Received on Sat May 19 2007 - 20:05:32 CEST

Original text of this message