Aggregation (with GBY) is Relational Division

From: <vadimtro_at_gmail.com>
Date: 2 Jun 2006 19:34:33 -0700
Message-ID: <1149302073.665332.55470_at_i40g2000cwc.googlegroups.com>



Well, not quite, but surprisingly close.

Consider a relation

Dept
^^^^

Name Pos# Sal
---- ---- ---
Acct 1 100
Acct 2 150
Res 1 200

In order to express

select Name, sum(Sal), min(Sal) from Dept group by Name

in relational terms we need an infinite relation

Aggr
^^^^

Bag# Pos# Sal sum min max
---- ---- --- --- --- ---

1       1 100 250 100 150
1       2 150 250 100 150
2       1 150 250 100 150
2       2 100 250 100 150
3       1 200 200 200 200

....

where only the portion of the Aggr relation which is relevant to the Dept data is shown.

Then the aggregation query that we are after is the set join

Dept /\(=) Aggr

The set join /\(=) returns all the tuples (Name, Bug#, sum, min, max) such that the set of (Pos#,Sal) pairs for each Name equals to the set of (Pos#,Sal) pairs matching with a certain Bag#. In our example the table can be rewritten in non 1NF as

Non1NFDept

^^^^^^^^^^

Name {(Pos#,Sal)}
----  ------------------

Acct {(1,100),(2,150)}
Res {(1,200)}
Non1NFAggr

^^^^^^^^^^
Bag# {(Pos#,Sal)} sum min max ---- ------------ --- --- --- 1 {(1,100),(2,150)} 250 100 150 2 {(1,150),(2,200)} 250 100 150 3 {(1,200)} 200 200 200

Then, the set join Dept/\(=)Aggr is just a natural join Non1NFDept/\Non1NFAggr. This particular form of set join is called equality set join, because join condition is matching sets by equality. Since join column is meaningless when we translate the result back into 1NF, the header of set join is always a symmetric difference of the original headers.

In our example:

Dept /\(=) Aggr

^^^^^^^^^^^^^^^

Name sum min max
---- --- --- ---
Acct 250 100 150
Res 200 200 200

Likewise, classic natural join is a set join where join condition requres the set intersection to be nonempty.

Finally, a (generalized) relational division Dept/\(<)Aggr is a set join where divisor sets are required to be subsets of the dividend. It is also called a containment set join. If the divisor header is a subset if dividend one then containment set join is classic relational division.

The equality set join Dept/\(=)Aggr could be expressed via relational division: it is an intersection of Dept/\(<)Aggr and Aggr/\(<)Dept.

Relational division can be expressed via standard RA operations, which concludes this study. The only remaining issue is that the middle column in the relation

Dept
^^^^

Name Pos# Sal
---- ---- ---
Acct 1 100
Acct 2 150
Res 1 200

doesn't look right. It could be more natural to have employee name there instead of meaningless number. But then we would have to generate an Aggr relation with the employee name matching column. This would demote the Aggr relation from being rather generic to being aware of employee names. Received on Sat Jun 03 2006 - 04:34:33 CEST

Original text of this message