Re: Aggregation (with GBY) is Relational Division

Date: 5 Jun 2006 10:08:57 -0700

Marshall wrote:
> > Aggr
> > ^^^^
> >
> > Bag# Pos# Sal sum min max
> > ---- ---- --- --- --- ---
> > 1 1 100 250 100 150
> > 1 2 150 250 100 150
> > 2 1 150 250 100 150
> > 2 2 100 250 100 150
> > 3 1 200 200 200 200
> > ....
> >
> > where only the portion of the Aggr relation which is relevant to the
> > Dept data is shown.
>
> I have a hard time with this relation. For one thing, I don't see
> how the third and fourth rows are "relevant to the Dept data."
> I think I see what you're trying to do. But I don't see how it's
> an interesting approach; the Aggr relation is too heavily bound
> to the current *value* of the Dept relation. (Unless I'm
> misunderstanding completely.)

It's a "universal" aggregation relation. To remove any reference to Dept we have to rename the third column into "Value". Here is a finite snippet of this relation (with values not referring to Salaries in the Dept table):

Bag# Pos# Val sum min max
---- ---- --- --- --- ---

0    0      0   0   0    0
1    0      1   1   1    1
2    0      2   2   2    2
3    0      3   3   3    3
....
4    0      0   0   0    0
4    1      0   0   0    0
5    0      0   1   0    1
5    1      1   1   0    1
6    0      1   1   0    1
6    1      0   1   0    1
7    0      2   2   0    2
7    1      0   2   0    2
8    0      1   2   0    1
8    1      1   2   0    1
....
9    0      0   0   0    0
9    1      0   0   0    0
9    2      0   0   0    0

....

Here bag enumeration has obvious defect, but finite sets of numbers can be enumerated, therefore this defect can be amended.

>
> Consider a simpler example, where we have only a single
> column to aggregate over, and we want to find the min.

The Pos# column is introduced solely to turn bags into sets. For min aggregate (or to sum distinct values) we could use a simpler aggregate relation

Set# Val sum min max
---- --- --- --- ---

0       0   0   0    0
1       1   1   1    1
2       2   2   2    2
3       3   3   3    3
....
5       0   1   0    1
5       1   1   0    1
7       0   2   0    2
7       2   2   0    2
....
9       0   3   0    2
9       1   3   0    2
9       2   3   0    2

....

Now we project the Dept relation to Emp, Sal columns and set join the result with the universal aggregate relation (with the Val column renamed to Sal).

> An alternative way to do aggregates is to allow natural
> join with functions that are closures; that is, that have
> are not pure functions in the mathematical sense; their
> output is not purely a function of their input, since they
>
> This approach brings up a whole host of annoying
> questions, such as parametric polymorphism and renaming,
> but it's still interesting, if not altogether practical.
>
> Consider sum.
>
> We have some relation R(a, b) and we want to
>
> SELECT a, sum(b) as result from R group by a
>
> We define an intermediate relation Aggr(a,result)
>
> We define a function f:
>
> f(a,b): () = {
> localA := a;
> if not exists (select * from Aggr where a = localA) {
> insert into Aggr values (0); -- the identity
> }
> update Aggr set result = result + b where a = localA;
> }
>
> Now we simply natural join R with f, and the result
> of the SELECT/GROUP BY above is in Aggr.

I feel uneasy about natural join producing the Aggr relation as a sideffect. Received on Mon Jun 05 2006 - 19:08:57 CEST

Original text of this message