Re: Aggregate functions on empty sets in SQL

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 8 Nov 2001 09:25:14 -0800
Message-ID: <c0d87ec0.0111080925.1e74f407_at_posting.google.com>


>> SELECY MIN(1) FROM Tbl WHERE 1 = 0;
 

 MIN(1)


   

 returns one row with NULL. 2 questions  

  1. Is it SQL standard? <<

Yes. The rules for all aggregate function, AGG([DISTINCT | ALL] <expr>) are simple:

  1. compute the set of expressions
  2. remove all NULLs from the set
  3. remove redundant duplicates if DISTINCT is present
  4. leave redundant duplicates if ALL is present, or if neither ALL or DISTINCT is present
  5. Perform the aggregate function on the set and return a scalar result.
  6. If this set is empty, return a NULL as the scalar result. The NULL is a place holder for a missing, unknown or impossible value.

>> 2. Is it for any reason better than returning minimal number in the
numeric domain? (Same for dates, strings and other domains with total ordering). <<

That is not the rules.

>> Every aggregate function internally starts a loop with some value
(for example, SUM starts with 0), and that value is the intuitive result of quering the empty set. <<

NO!!! We never defined the aggregate functions that way!! We actually call them "set functions" in the SQL-92 Standards. There is no procedural code in the definitions of SQL until you get to cursors.

We had a long thread on this topic a few years ago in the SQL Server newsgroup. One guy could not understand that summation of a SERIES was not the same as the sum of a SET. He had not had a college course in set theory and was thinking in procedural terms -- i.e. the old "Big Sigma" notation he had in calculus. He had never seen a Sigma with a set expression for the index and could not get over the way he had been taught calculus.

if you need to have a proceudre model, then use this pseudo-code:

CREATE INTEGER PROCEDURE AggrSum (IN ARRAY A[1:max_i] INTEGER)  BEGIN
 DECLARE i INTEGER DEFAULT NULL,

         sum INTEGER DEFAULT NULL, 
         max_i INTEGER DEFAULT NULL;

 sum := A[1];
 REPEAT
  sum := a[i+1];
  i := i + 1;
 UNTIL i >= max_i;
 RETURN (sum);
 END; If the array A is my set and it is empty I guess that would mean (max_i < 1).

>> Besides, in order to find out the min/max numbers in the domain I
would have to make one less trip to the documnetation <<

You are not that lazy, are you <g>? Also this would destroy portablity across products on diffreent hardware. Received on Thu Nov 08 2001 - 18:25:14 CET

Original text of this message