Re: GROUP BY
Date: 19 May 2007 11:05:32 -0700
Message-ID: <1179597932.006120.282290_at_q75g2000hsh.googlegroups.com>
On May 19, 7:38 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:
> "Marshall" <marshall.spi..._at_gmail.com> wrote in message
>
> > So: what *about* that extended extend?
>
> > Extend is a particular form of join, yes? So perhaps we
> > ought to apply generators with join?
>
> > I note that there is an issue with regards to the uniqueness
> > of values produced by a generator. It's not clear to me if
> > that ought to be required or not. If it's required, it's not clear
> > to me if it's enforceable, since I don't see how to prove uniqueness.
>
> > If the outputs of the generator aren't unique, then we have
> > a function from a tuple to a bag, which also matches the
> > fact that aggregate functions are functions from a bag to
> > a tuple.
>
> I don't agree with this. An aggregate function iterates over a relation,
> taking a projection of each tuple to find the values required to calculate
> the result.
Yours is maybe a bit narrow, though, because by being so specific about the algorithm to implement aggregation, you've eliminated the possibility for parallelization.
> While it is convenient to visualize the collection of values
> for an attribute in a relation as a bag, it should not be ignored that any
> particular combination of values in that bag depends solely on the existence
> of a particular *set* of tuples.
Um, I don't see how. Anyway, we can certainly consider sum() in
the abstract. It is a function taking a bag of numbers to a single
number.
In some languages it is considered to take a list, but that too is
overspecifying, since order is not significant.
> > Ex: a function n_ones(n), which takes a natural number and
> > generates the number 1 that many times. This is a subset of
> > the inverse of sum().
>
> > n_ones(3) = {| 1, 1, 1 |}
>
> > sum( {| 1, 1, 1 |} = 3
>
> > So generators are the inverses of aggregates, and join is the inverse
> > of group by, and generators are applied with join and aggregates
> > are applied with group-by.
>
> Frankly, I don't see how there can even be an inverse for an aggregate.
> When you aggregate many distinct values into one, information is lost. How
> can you possibly reverse the process without that information? For example,
> if you have 30 students in a class where 'C' is the average grade, how can
> you tell who aced the course or who failed?
The situation is no different with proper functions. Some are
invertible, some
are not. Any bijection is invertible. The function f(x) = 5 is not.
Some aggregates are invertible. n_ones() is invertible, don't you think?
Marshall Received on Sat May 19 2007 - 20:05:32 CEST