Re: missing information and aggregates

From: Mikito Harakiri <mikharakiri_at_ywho.com>
Date: Mon, 8 Sep 2003 10:59:38 -0700
Message-ID: <EA37b.21$741.207_at_news.oracle.com>

"--CELKO--" <joe.celko_at_northface.edu> wrote in message news:a264e7ea.0309071145.43d64bf8_at_posting.google.com...
> >> It is obvious that aggregates should return a non-null value at
> least in cases like these:
>
> SELECT SUM (sal) FROM Emp WHERE 1 = 0;
>
> the correct result is 0, not the NULL (SQL standard?) <<
>
> Why isw it obvious? Consider the set {1,-1} (or any set that sums to
> zero from actual values). Why would you want a set operation SUM()
> that cannot tell the difference between the empty set and these
> non-empty sets?

Empty sets aren't special. We don't want to distinguish them, we just want to extend definition of SUM to empty sets which is mathematically consistent.

The defintion where the query

select sum_empty+sum_nonempty from (

select sum(sal) sum_empty from emp where 1=0 ), (

select sum(sal) sum_nonempty from emp where 1!=0 )

returns non null answer meets my math consistency criteria. The "standard" definition that returns null does not. (Note that the query above returns the same answer no matter what predicate is, since both sets are complementary. It's just silly if it breaks in case when one set is empty).

> >> I find it strange, arbitrary and inconsistent that COUNT(*) returns
> 0 for zero rows but SUM(1) returns NULL <<
>
> COUNT(*) is a bad notation; it is the cardinality operator on the set
> as a whole. All the other aggregate functions are done with values
> from rows (elements) inside the table (set). The syntax should have
> been something like
>
> CARD(SELECT * FROM .. WHERE...)
The standard notation is certainly ugly. It makes little sence fixing it, as count is redundant anyway.

> In short, Knuth's notation follows the SQL model. Knuth stated that
> making the empty set equal to 1 udner all operations was a convention
> that made some of the summations easier to manipulate. Instead of
> explictly excluding undefined terms, convert them to the identity
> element instead. You can have some real problems with sets that are
> hard to define, like all Primes or the (3n+1) sets, etc.

I would like to see some examples in SQL notation if you want to explore this topic furhter.

> >> SELECT MAX(sal) FROM Emp WHERE 1 = 0;
> the correct result is minus infinity, not the NULL. I would accept the
> minimum value of the data type, which is a close enough approximation
> to minus infinity. The other option is an underflow exception. <<
>
> Never return an actual value for a missing/error/unknown marker when
> it could be meaningful -- someone will do math with it and not know it
> is a value marker. Centura (nee Gupta) returned minus infinity in
> some of its operations, but they are the only one I know that did.
> While it might complicate things a bit, the IEEE floating point specs
> have some NaN (Not a Number) configurations that would be standard.

What max(NaN, 5) is? I certainly know that max("minus infinity", 5) = 5. Note that you must have 2 infinity symbols to be able to handle cases like this, not just a single one. Received on Mon Sep 08 2003 - 19:59:38 CEST

This message: [ Message body ]
Next message: Mikito Harakiri: "Re: Possible problems with Date & McGoveran View Updating"
Previous message: Anith Sen: "Re: how to set composite key as foreign key?"
Maybe in reply to: Bob Badour: "missing information and aggregates"
In reply to Mikito Harakiri: "Re: missing information and aggregates"
Next in thread: Mikito Harakiri: "Re: Possible problems with Date & McGoveran View Updating"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message