Re: Aggregation (with GBY) is Relational Division

From: Marshall <marshall.spight_at_gmail.com>
Date: 5 Jun 2006 11:31:12 -0700
Message-ID: <1149532272.555992.105140_at_i39g2000cwa.googlegroups.com>


David Cressey wrote:
> "Marshall" <marshall.spight_at_gmail.com> wrote in message
>
> > Actually, I had kind of a jolt reading something Bob Badour
> > wrote recently about defining aggregates as either a fold
> > or an expression written in terms of folds. (He did not use
> > the term "fold".) Thus, avg(x) can be defined as simply
> > sum(x) / count().
>
> I disagree. avg(x) can be defined as simply sum(x) / count(x).
>
> This may seem like nitpicking. I think it's reasonably important.
>
> If you do select avg(x), sum(x)/count(*), sum(x)/count(x) from some_view
>
> you are going to get a difference when there are rows in the view that
> contain NULL in the x column. Try it and see.

.... yeah, I see what you mean. If I define it as sum(x) / count(), then I'm not counting and summing the same things.

Compared with sum(x) / count(x), in which the sum and the count *are* of the same thing. So this *isn't* just an issue with SQL's mixing of empty-set semantics and unknown semantics.

> This relates to a comment about managing missing data in SQL that I didn't
> take very much to heart.
>
> PS: can you point me to a web site that will explain "fold" to me?

I did a bit of searching and came up with a short intro:

http://www.cse.unsw.edu.au/~en1000/haskell/hof.html

The functional languages don't have the RA, but they do have map, filter, and fold. Not as good, but closer than anything else.

Fold is simply inserting a binary operator "in between" the elements in a collection. Usually an ordered, collection, alas. Since it's an ordered collection, they have to do the extra work to take order into account, so you have both foldl and foldr, depending on whether you start at the "left" end of the list or the "right". (Beginning
vs. end.)

Interestingly, the obvious conclusion about the type of the function being folded is that it should be

'a, 'a -> 'a

a simple binary function on a single domain.

But in fact, the most general version of the function is

'a, 'b -> 'b

For example, we can reverse a list using fold.

Given a function append(element, list) that returns a list like the one it is passed, but with element tacked on the end, we can fold append over a list and reverse it.

foldl(append, [], [1, 2, 3])

returns [3, 2, 1]

Fold depends on being able to separate the first element of a list from the rest, and on the ability to recognize an empty list, but I don't think it needs anything else.

Marshall Received on Mon Jun 05 2006 - 20:31:12 CEST

Original text of this message