Re: does a table always need a PK?

From: Lee Fesperman <firstsql_at_ix.netcom.com>
Date: Tue, 02 Sep 2003 08:22:18 GMT
Message-ID: <3F5452A0.1AB_at_ix.netcom.com>


Bob Badour wrote:
>
> "Lee Fesperman" <firstsql_at_ix.netcom.com> wrote in message
> news:3F510C49.2A6E_at_ix.netcom.com..
> > Bob Badour wrote:
> > > Does the mathematical identity "SUM(A)+SUM(B)=SUM(A+B)" demonstrate
> > > a flaw or limitation of SQL's NULL?
> >
> > No, if you ignore the empty set issue.
>
> I am surprised by this answer, because I know we have discussed it before.
> Given the following table, T:
>
> A B
> = =
> 1 2
> 3 NULL
> NULL 4
>
> What are the results of the following queries?
>
> select SUM(A) from T;
> select SUM(B) from T;
> select SUM(A+B) from T;
>
> Am I missing something?

I didn't intend to mislead. I decided against a longer response to keep all the answers succinct.

I answered 'no' because I consider it an inherent of aspect of NULL processing rather than a flaw or limitation. IOW, I am saying that the sub-expression above is not an identity when either A or B can contain a NULL.

Here is a longer response:

If I do a query on T and add (A+B) as a calculated attribute:

  select A, B, (A+B) as C from T;

I get:

 A        B        C
 =        =        =
 1        2        3
 3        NULL     NULL
 NULL     4        NULL

Applying the SUM() function to each result attribute:

  select SUM(A), SUM(B), SUM(C) from (select A, B, (A+B) as C from T);

produces:

 SUM(A) SUM(B) SUM(C)

  • ====== ====== 4 6 3

(SUM(A) + SUM(B)) is not equal to SUM(C)

SUM(A) is the sum of the values -- A from tuple 1, A from tuple 2, A from tuple 3.

Using subscripts to indicate tuples, I can write it as:

 SUM(A) is sum of A[1], A[2], A[3]

or:

 SUM(A) is sum of 1, 3, NULL, giving 4.

and:

 SUM(B) is sum of B[1], B[2], B[3]

or:

 SUM(B) is sum of 2, NULL, 4, giving 6.

For SUM(A+B):

 SUM(A+B) is sum of (A+B)[1], (A+B)[2], (A+B)[3]

 SUM(A+B) is sum of 3, NULL, NULL, giving 3

It is not:

 SUM(A+B) is sum of A[1], B[1], A[2], B[2], A[3], B[3]

producing:

 SUM(A+B) is sum of 1, 2, 3, NULL, NULL, 4, giving 10

... because the SUM() function and the '+' operator, while similar, are different operations. SUM() throws away any NULL results and only sums non-NULL values. The + operator yields NULL if either operand is NULL.

Aggregate functions, like SUM(), operate on values from tuples. A is a value from a tuple. B is a value from a tuple. And, the expression (A+B) is a value from a tuple. The expression (A+B) yields 3 in tuple 1 and NULL in tuples 2 and 3, thus SUM(A+B) yields 3.

Codd recommended that aggregate functions have a modifier that specifies the disposition of NULLs in the aggregation. IIRC, it was intended as part of the aggregate function.

However, specifying it on the aggregate function like this pseudo-syntax:

 SUM(A+B) substitute 0 for NULL

will not resolve the identity mismatch.

SQL provides the coalesce() function for finer grained specification:

 SUM(coalesce(A,0)+coalesce(B,0))

Using coalesce():

 SUM(coalesce(A,0)) + SUM(coalesce(B,0)) = SUM(coalesce(A,0)+coalesce(B,0))

evaluates to True.

-- 
Lee Fesperman, FirstSQL, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS  (http://www.firstsql.com)
Received on Tue Sep 02 2003 - 10:22:18 CEST

Original text of this message