Re: missing information and aggregates

From: --CELKO-- <joe.celko_at_northface.edu>
Date: 7 Sep 2003 12:45:32 -0700
Message-ID: <a264e7ea.0309071145.43d64bf8_at_posting.google.com>


>> It is obvious that aggregates should return a non-null value at
least in cases like these:

 SELECT SUM (sal) FROM Emp WHERE 1 = 0;  

 the correct result is 0, not the NULL (SQL standard?) <<

Why isw it obvious? Consider the set {1,-1} (or any set that sums to zero from actual values). Why would you want a set operation SUM() that cannot tell the difference between the empty set and these non-empty sets?  

>> I find it strange, arbitrary and inconsistent that COUNT(*) returns
0 for zero rows but SUM(1) returns NULL <<  

COUNT(*) is a bad notation; it is the cardinality operator on the set as a whole. All the other aggregate functions are done with values from rows (elements) inside the table (set). The syntax should have been something like

 CARD(SELECT * FROM .. WHERE...)
>> ... claimed the answer to these aggregates over zero rows is
undefined, and he quoted a passage from _Concrete Mathematics_ by Graham, Knuth and Patashnik a page or two before the passage that explicitly states the answer is defined as the identity element. <<

The Knuth notation (which he got from Iverson) replaces the use of a sequence for an index of summation on a series with a set characteristic function. In the Knuth notation, all the terms come into existence and are added "all at once"; the old notation due to Fourier was a FOR loop, which generated each sum in a series.

In short, Knuth's notation follows the SQL model. Knuth stated that making the empty set equal to 1 udner all operations was a convention that made some of the summations easier to manipulate. Instead of explictly excluding undefined terms, convert them to the identity element instead. You can have some real problems with sets that are hard to define, like all Primes or the (3n+1) sets, etc.

>> SELECT MAX(sal) FROM Emp WHERE 1 = 0;
the correct result is minus infinity, not the NULL. I would accept the minimum value of the data type, which is a close enough approximation to minus infinity. The other option is an underflow exception. <<

Never return an actual value for a missing/error/unknown marker when it could be meaningful -- someone will do math with it and not know it is a value marker. Centura (nee Gupta) returned minus infinity in some of its operations, but they are the only one I know that did. While it might complicate things a bit, the IEEE floating point specs have some NaN (Not a Number) configurations that would be standard. Received on Sun Sep 07 2003 - 21:45:32 CEST

Original text of this message