Re: Aggregation (with GBY) is Relational Division
Date: 5 Jun 2006 10:08:57 -0700
Message-ID: <1149527337.197773.313200_at_c74g2000cwc.googlegroups.com>
Marshall wrote:
> vadimtro_at_gmail.com wrote:
> > Aggr
> > ^^^^
> >
> > Bag# Pos# Sal sum min max
> > ---- ---- --- --- --- ---
> > 1 1 100 250 100 150
> > 1 2 150 250 100 150
> > 2 1 150 250 100 150
> > 2 2 100 250 100 150
> > 3 1 200 200 200 200
> > ....
> >
> > where only the portion of the Aggr relation which is relevant to the
> > Dept data is shown.
>
> I have a hard time with this relation. For one thing, I don't see
> how the third and fourth rows are "relevant to the Dept data."
> I think I see what you're trying to do. But I don't see how it's
> an interesting approach; the Aggr relation is too heavily bound
> to the current *value* of the Dept relation. (Unless I'm
> misunderstanding completely.)
It's a "universal" aggregation relation. To remove any reference to Dept we have to rename the third column into "Value". Here is a finite snippet of this relation (with values not referring to Salaries in the Dept table):
Bag# Pos# Val sum min max
---- ---- --- --- --- ---
0 0 0 0 0 0 1 0 1 1 1 1 2 0 2 2 2 2 3 0 3 3 3 3 .... 4 0 0 0 0 0 4 1 0 0 0 0 5 0 0 1 0 1 5 1 1 1 0 1 6 0 1 1 0 1 6 1 0 1 0 1 7 0 2 2 0 2 7 1 0 2 0 2 8 0 1 2 0 1 8 1 1 2 0 1 .... 9 0 0 0 0 0 9 1 0 0 0 0 9 2 0 0 0 0
....
Here bag enumeration has obvious defect, but finite sets of numbers can be enumerated, therefore this defect can be amended.
>
> Consider a simpler example, where we have only a single
> column to aggregate over, and we want to find the min.
The Pos# column is introduced solely to turn bags into sets. For min aggregate (or to sum distinct values) we could use a simpler aggregate relation
Set# Val sum min max
---- --- --- --- ---
0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 .... 5 0 1 0 1 5 1 1 0 1 7 0 2 0 2 7 2 2 0 2 .... 9 0 3 0 2 9 1 3 0 2 9 2 3 0 2
....
Now we project the Dept relation to Emp, Sal columns and set join the result with the universal aggregate relation (with the Val column renamed to Sal).
> An alternative way to do aggregates is to allow natural
> join with functions that are closures; that is, that have
> access to variables defined outside their scope. Closures
> are not pure functions in the mathematical sense; their
> output is not purely a function of their input, since they
> have access to those additional variables I mentioned.
>
> This approach brings up a whole host of annoying
> questions, such as parametric polymorphism and renaming,
> but it's still interesting, if not altogether practical.
>
> Consider sum.
>
> We have some relation R(a, b) and we want to
>
> SELECT a, sum(b) as result from R group by a
>
> We define an intermediate relation Aggr(a,result)
>
> We define a function f:
>
> f(a,b): () = {
> localA := a;
> if not exists (select * from Aggr where a = localA) {
> insert into Aggr values (0); -- the identity
> }
> update Aggr set result = result + b where a = localA;
> }
>
> Now we simply natural join R with f, and the result
> of the SELECT/GROUP BY above is in Aggr.
I feel uneasy about natural join producing the Aggr relation as a sideffect. Received on Mon Jun 05 2006 - 19:08:57 CEST