Re: Aggregation (with GBY) is Relational Division
Date: 5 Jun 2006 07:27:57 -0700
Message-ID: <1149517677.631930.76880_at_i40g2000cwc.googlegroups.com>
Bob Badour wrote:
> Marshall wrote:
>
> > vadimtro_at_gmail.com wrote:
>
> [snip]
>
> > Actually, I had kind of a jolt reading something Bob Badour
> > wrote recently about defining aggregates as either a fold
> > or an expression written in terms of folds. (He did not use
> > the term "fold".) Thus, avg(x) can be defined as simply
> > sum(x) / count(). And sum(x) can be defined as simply
> > fold(x,+,0).
> >
> > I like that much better. For one thing, it makes it much
> > easier to eliminate duplicate folds in a complex group-by.
> > (It's hard to explain why, so I won't bother. It's not clear
> > if this post will interest anyone anyway, the OP excepted. :-)
>
> I am surprised that my comment would jolt you. I wonder how you would
> define standard deviation without reference to variance, for instance,
> or variance without reference to sums of squares etc. I cannot imagine
> anyone forsaking the normal definition of average to replace it with
> something that has to multiply by (i-1) before every i-th term and then
> divide by i after adding the term.
Ewww, no. Death by a million rounding errors.
The issue is, in most programming languages, the code for average doesn't look much like it's definition. The language doesn't provide a sufficiently direct way for the programmer to express his intent.
In the procedural world, you'd be more likely to write the darn function yourself, in an annoyingly manual style.
double avg(x[]) {
double total = 0;
int count = 0; for (int i=0; i<size(x); i++) { total += x[i]; count++;
}
return total / count;
}
Not very elegant.
(I could poke fun at OO by gratuitously complexifying the above while making it into a method, but I won't. In an OOPL you'd just write the above function in much the same way.)
In a functional language, you'd be more likely to do something like
func avg(x[]) ) {
func avg'(i, (total, count)) {
(total+i,count+1);
}
(total, count) = fold(x,avg')
total/count;
}
In other words, we create an auxilliary function avg', which takes a number i and a pair of numbers, and adds i to the first of the pair and 1 to the second. Then fold avg' over the supplied list, then divide out the pair. (I forget what this technique is called; it has some specific name.) Better, but still not a particularly direct expression of the programmer's intent.
(I guess in prolog you'd do something like the above functional approach but useing pattern matching on the list instead of fold?)
The only kinds of languages I know of where you could just say the definition of average directly would be the APL family, including J and K. In fact, with those languages you can even say it in a point-free style:
avg = sum / count
Hmmmm. I guess that comes out of them being array languages somehow. Hmmm. I should go read about how they make that work again.
> (I am not even sure how I would express that mathematically without
> using recursion.)
>
>
> >>From a programming language standpoint, it raises the
> > question of handling the aggregate expressions. But
> > that's not any theoretical difficulty, and not necessarily
> > all that much design difficulty.
>
> I wonder what you would call arbitrary expressions defined in terms of
> aggregates if you did not call them aggregates.
Not sure I understand this comment. I wasn't talking about terminology.
Just to take a very simple example: sum.
Sum is the fold of +, with an identity of 0. How do you write that? What's the notation? If you code it up in a language with fold, then when you invoke fold, you pass it the collection of values, not each value in a expression that will be evaluated for each element in a set, the way you would in SQL or whatever relational language.
So you need some mechanism *besides* the usual function application: you need aggregate application.
As I said, this is not a very big deal, though.
Marshall Received on Mon Jun 05 2006 - 16:27:57 CEST