Aggregation (with GBY) is Relational Division
Date: 2 Jun 2006 19:34:33 -0700
Message-ID: <1149302073.665332.55470_at_i40g2000cwc.googlegroups.com>
Well, not quite, but surprisingly close.
Consider a relation
Dept
^^^^
Name Pos# Sal
---- ---- ---
Acct 1 100
Acct 2 150
Res 1 200
In order to express
select Name, sum(Sal), min(Sal) from Dept group by Name
in relational terms we need an infinite relation
Aggr
^^^^
Bag# Pos# Sal sum min max
---- ---- --- --- --- ---
1 1 100 250 100 150 1 2 150 250 100 150 2 1 150 250 100 150 2 2 100 250 100 150 3 1 200 200 200 200
....
where only the portion of the Aggr relation which is relevant to the Dept data is shown.
Then the aggregation query that we are after is the set join
Dept /\(=) Aggr
The set join /\(=) returns all the tuples (Name, Bug#, sum, min, max) such that the set of (Pos#,Sal) pairs for each Name equals to the set of (Pos#,Sal) pairs matching with a certain Bag#. In our example the table can be rewritten in non 1NF as
Non1NFDept
^^^^^^^^^^
Name {(Pos#,Sal)}
---- ------------------
Acct {(1,100),(2,150)}
Res {(1,200)}
Non1NFAggr
^^^^^^^^^^
Bag# {(Pos#,Sal)} sum min max ---- ------------ --- --- --- 1 {(1,100),(2,150)} 250 100 150 2 {(1,150),(2,200)} 250 100 150 3 {(1,200)} 200 200 200
Then, the set join Dept/\(=)Aggr is just a natural join Non1NFDept/\Non1NFAggr. This particular form of set join is called equality set join, because join condition is matching sets by equality. Since join column is meaningless when we translate the result back into 1NF, the header of set join is always a symmetric difference of the original headers.
In our example:
Dept /\(=) Aggr
^^^^^^^^^^^^^^^
Name sum min max
---- --- --- ---
Acct 250 100 150
Res 200 200 200
Likewise, classic natural join is a set join where join condition requres the set intersection to be nonempty.
Finally, a (generalized) relational division Dept/\(<)Aggr is a set join where divisor sets are required to be subsets of the dividend. It is also called a containment set join. If the divisor header is a subset if dividend one then containment set join is classic relational division.
The equality set join Dept/\(=)Aggr could be expressed via relational division: it is an intersection of Dept/\(<)Aggr and Aggr/\(<)Dept.
Relational division can be expressed via standard RA operations, which concludes this study. The only remaining issue is that the middle column in the relation
Dept
^^^^
Name Pos# Sal
---- ---- ---
Acct 1 100
Acct 2 150
Res 1 200
doesn't look right. It could be more natural to have employee name there instead of meaningless number. But then we would have to generate an Aggr relation with the employee name matching column. This would demote the Aggr relation from being rather generic to being aware of employee names. Received on Sat Jun 03 2006 - 04:34:33 CEST