Re: does a table always need a PK?
Date: Tue, 2 Sep 2003 09:03:38 -0400
Message-ID: <KI15b.368$GB1.39323687_at_mantis.golden.net>
"Lee Fesperman" <firstsql_at_ix.netcom.com> wrote in message
news:3F5452A0.1AB_at_ix.netcom.com...
> Bob Badour wrote:
> >
> > "Lee Fesperman" <firstsql_at_ix.netcom.com> wrote in message
> > news:3F510C49.2A6E_at_ix.netcom.com..
> > > Bob Badour wrote:
> > > > Does the mathematical identity "SUM(A)+SUM(B)=SUM(A+B)" demonstrate
> > > > a flaw or limitation of SQL's NULL?
> > >
> > > No, if you ignore the empty set issue.
> >
> > I am surprised by this answer, because I know we have discussed it
before.
> > Given the following table, T:
> >
> > A B
> > = =
> > 1 2
> > 3 NULL
> > NULL 4
> >
> > What are the results of the following queries?
> >
> > select SUM(A) from T;
> > select SUM(B) from T;
> > select SUM(A+B) from T;
> >
> > Am I missing something?
>
> I didn't intend to mislead.
I know. Even if you had tried to avoid one question, I would conclude you are only human and intellectually honest on balance. It's not like you tried to avoid most of the questions or made any specious claims for your product or tried to deny the possibility of there being other decent vendors out there.
> I decided against a longer response to keep all the answers
> succinct.
>
> I answered 'no' because I consider it an inherent of aspect of NULL
processing rather
> than a flaw or limitation. IOW, I am saying that the sub-expression above
is not an
> identity when either A or B can contain a NULL.
>
> Here is a longer response:
>
> If I do a query on T and add (A+B) as a calculated attribute:
>
> select A, B, (A+B) as C from T;
>
> I get:
>
> A B C
> = = =
> 1 2 3
> 3 NULL NULL
> NULL 4 NULL
>
> Applying the SUM() function to each result attribute:
>
> select SUM(A), SUM(B), SUM(C) from (select A, B, (A+B) as C from T);
>
> produces:
>
> SUM(A) SUM(B) SUM(C)
> ====== ====== ======
> 4 6 3
>
> (SUM(A) + SUM(B)) is not equal to SUM(C)
That's not what I would expect. I was under the impression SQL would simply omit any rows with a NULL in any SUM expression. In this case, I would expect the sums to result in 1+2=3 preserving the identity. It is only when evaluating the sums separately that I would expect the identity to fail.
> SUM(A) is the sum of the values -- A from tuple 1, A from tuple 2, A from
tuple 3.
>
> Using subscripts to indicate tuples, I can write it as:
>
> SUM(A) is sum of A[1], A[2], A[3]
>
> or:
>
> SUM(A) is sum of 1, 3, NULL, giving 4.
>
> and:
>
> SUM(B) is sum of B[1], B[2], B[3]
>
> or:
>
> SUM(B) is sum of 2, NULL, 4, giving 6.
>
> For SUM(A+B):
>
> SUM(A+B) is sum of (A+B)[1], (A+B)[2], (A+B)[3]
>
> SUM(A+B) is sum of 3, NULL, NULL, giving 3
>
> It is not:
>
> SUM(A+B) is sum of A[1], B[1], A[2], B[2], A[3], B[3]
>
> producing:
>
> SUM(A+B) is sum of 1, 2, 3, NULL, NULL, 4, giving 10
>
> ... because the SUM() function and the '+' operator, while similar, are
different
> operations. SUM() throws away any NULL results and only sums non-NULL
values. The +
> operator yields NULL if either operand is NULL.
>
> Aggregate functions, like SUM(), operate on values from tuples. A is a
value from a
> tuple. B is a value from a tuple. And, the expression (A+B) is a value
from a tuple. The
> expression (A+B) yields 3 in tuple 1 and NULL in tuples 2 and 3, thus
SUM(A+B) yields 3.
>
> Codd recommended that aggregate functions have a modifier that specifies
the disposition
> of NULLs in the aggregation. IIRC, it was intended as part of the
aggregate function.
>
> However, specifying it on the aggregate function like this pseudo-syntax:
>
> SUM(A+B) substitute 0 for NULL
>
> will not resolve the identity mismatch.
>
> SQL provides the coalesce() function for finer grained specification:
>
> SUM(coalesce(A,0)+coalesce(B,0))
>
> Using coalesce():
>
> SUM(coalesce(A,0)) + SUM(coalesce(B,0)) =
SUM(coalesce(A,0)+coalesce(B,0))
>
> evaluates to True.
Would you agree, then, that NULL and coalesce might place an additional cognitive burden on the user ? Received on Tue Sep 02 2003 - 15:03:38 CEST