# Re: Aggregation (with GBY) is Relational Division

Date: 5 Jun 2006 10:08:57 -0700

Message-ID: <1149527337.197773.313200_at_c74g2000cwc.googlegroups.com>

Marshall wrote:

*> vadimtro_at_gmail.com wrote:
**> > Aggr
**> > ^^^^
**> >
**> > Bag# Pos# Sal sum min max
**> > ---- ---- --- --- --- ---
**> > 1 1 100 250 100 150
**> > 1 2 150 250 100 150
**> > 2 1 150 250 100 150
**> > 2 2 100 250 100 150
**> > 3 1 200 200 200 200
**> > ....
**> >
**> > where only the portion of the Aggr relation which is relevant to the
**> > Dept data is shown.
**>
*

> I have a hard time with this relation. For one thing, I don't see

*> how the third and fourth rows are "relevant to the Dept data."
**> I think I see what you're trying to do. But I don't see how it's
**> an interesting approach; the Aggr relation is too heavily bound
**> to the current *value* of the Dept relation. (Unless I'm
**> misunderstanding completely.)
*

It's a "universal" aggregation relation. To remove any reference to Dept we have to rename the third column into "Value". Here is a finite snippet of this relation (with values not referring to Salaries in the Dept table):

Bag# Pos# Val sum min max

---- ---- --- --- --- ---

0 0 0 0 0 0 1 0 1 1 1 1 2 0 2 2 2 2 3 0 3 3 3 3 .... 4 0 0 0 0 0 4 1 0 0 0 0 5 0 0 1 0 1 5 1 1 1 0 1 6 0 1 1 0 1 6 1 0 1 0 1 7 0 2 2 0 2 7 1 0 2 0 2 8 0 1 2 0 1 8 1 1 2 0 1 .... 9 0 0 0 0 0 9 1 0 0 0 0 9 2 0 0 0 0

....

Here bag enumeration has obvious defect, but finite sets of numbers can be enumerated, therefore this defect can be amended.

*>
*

> Consider a simpler example, where we have only a single

*> column to aggregate over, and we want to find the min.
*

The Pos# column is introduced solely to turn bags into sets. For min aggregate (or to sum distinct values) we could use a simpler aggregate relation

Set# Val sum min max

---- --- --- --- ---

0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 .... 5 0 1 0 1 5 1 1 0 1 7 0 2 0 2 7 2 2 0 2 .... 9 0 3 0 2 9 1 3 0 2 9 2 3 0 2

....

Now we project the Dept relation to Emp, Sal columns and set join the result with the universal aggregate relation (with the Val column renamed to Sal).

> An alternative way to do aggregates is to allow natural

*> join with functions that are closures; that is, that have
**> access to variables defined outside their scope. Closures
**> are not pure functions in the mathematical sense; their
**> output is not purely a function of their input, since they
**> have access to those additional variables I mentioned.
**>
**> This approach brings up a whole host of annoying
**> questions, such as parametric polymorphism and renaming,
**> but it's still interesting, if not altogether practical.
**>
**> Consider sum.
**>
**> We have some relation R(a, b) and we want to
**>
**> SELECT a, sum(b) as result from R group by a
**>
**> We define an intermediate relation Aggr(a,result)
**>
**> We define a function f:
**>
**> f(a,b): () = {
**> localA := a;
**> if not exists (select * from Aggr where a = localA) {
**> insert into Aggr values (0); -- the identity
**> }
**> update Aggr set result = result + b where a = localA;
**> }
**>
**> Now we simply natural join R with f, and the result
**> of the SELECT/GROUP BY above is in Aggr.
*

I feel uneasy about natural join producing the Aggr relation as a sideffect. Received on Mon Jun 05 2006 - 19:08:57 CEST