Re: does a table always need a PK?

From: Bob Badour <bbadour_at_golden.net>
Date: Wed, 3 Sep 2003 09:28:10 -0400
Message-ID: <2an5b.419$yF3.43910266_at_mantis.golden.net>


"Christopher Browne" <cbbrowne_at_acm.org> wrote in message news:bj3q4i$eaohu$1_at_ID-125932.news.uni-berlin.de...
> Centuries ago, Nostradamus foresaw when "Bob Badour" <bbadour_at_golden.net>
would write:
> > "Lee Fesperman" <firstsql_at_ix.netcom.com> wrote in message
> > news:3F553AD7.7C3_at_ix.netcom.com...
> >> Bob Badour wrote:
> >> >
> >> > "Lee Fesperman" <firstsql_at_ix.netcom.com> wrote in message
> >> > news:3F5452A0.1AB_at_ix.netcom.com..
> >> > > If I do a query on T and add (A+B) as a calculated attribute:
> >> > >
> >> > > select A, B, (A+B) as C from T;
> >> > >
> >> > > I get:
> >> > >
> >> > > A B C
> >> > > = = =
> >> > > 1 2 3
> >> > > 3 NULL NULL
> >> > > NULL 4 NULL
> >> > >
> >> > > Applying the SUM() function to each result attribute:
> >> > >
> >> > > select SUM(A), SUM(B), SUM(C) from (select A, B, (A+B) as C from
T);
> >> > >
> >> > > produces:
> >> > >
> >> > > SUM(A) SUM(B) SUM(C)
> >> > > ====== ====== ======
> >> > > 4 6 3
> >> > >
> >> > > (SUM(A) + SUM(B)) is not equal to SUM(C)
> >> >
> >> > That's not what I would expect. I was under the impression SQL would
> > simply
> >> > omit any rows with a NULL in any SUM expression. In this case, I
would
> >> > expect the sums to result in 1+2=3 preserving the identity. It is
only
> > when
> >> > evaluating the sums separately that I would expect the identity to
fail.
> >>
> >> I'm sorry. I don't follow you here.
> >
> > Unless I am mistaken, the following queries return the following
results:
> >
> > Select SUM(A) from T;
> > SUM(A)
> > ======
> > 4
> >
> >
> > Select SUM(B) from T;
> > SUM(B)
> > ======
> > 6
> >
> > Select SUM(A), SUM(B) from T;
> >
> > SUM(A) SUM(B)
> > ====== ======
> > 1 2
> >
> > The last result comes about by SQL discarding every row with a NULL
> > that would affect the result. The first query includes 3 because the
> > query does not reference B. The second query includes 4 because the
> > query does not reference A. The third query includes neither because
> > the query references both A and B. Does FirstSQL behave differently?
> > Am I mistaken about SQL?

>

> You are omitting the other rational possibility, which is that NULL is
> a "contagion" that NULLs out anything it touches.
>

> It would be quite reasonable for the result of ALL of these queries to
> be the same:
>

> select sum(a) from t;
> NULL
>

> select sum(b) from t;
> NULL
>

> select sum(a+b) from t;
> NULL
>

> To avoid that, there has got to be an implicit "WHERE CLAUSE;" in
> effect:
>

> select sum(a) from t;
> is actually
> select sum(a) from t where a is not null;

Yes, I agree that would be better. Currently, the dbms is forced to visit every row to deliver a number that is very often not the number the user thinks it is, the user has no signal to suspect the number. If the dbms could stop processing at the first NULL, it would deliver the NULL result faster and the unexpected result would alert the user to the learning opportunity.

> What I would realistically expect to see from SQL is something
> resembling the latter two queries. But note that ALL of these options
> are rational.

>

> Including the one where, since there are NULL values in domains a and
> b, the result of the query is NULL, since NULL is a fatal contagion
> :-).
>
> This diversity of rational answers to the queries is quite disturbing.

Yes, the redundancy of SQL is problematic. It would less of a problem if optimizers were better.

> >> > > Using coalesce():
> >> > >
> >> > > SUM(coalesce(A,0)) + SUM(coalesce(B,0)) =
> >> > SUM(coalesce(A,0)+coalesce(B,0))
> >> > >
> >> > > evaluates to True.
> >> >
> >> > Would you agree, then, that NULL and coalesce might place an
> >> > additional cognitive burden on the user ?
> >>
> >> Certainly. Though, I would argue that the additional complexity was
> >> inherent when dealing with missing information. It then becomes a
> >> discussion of which technique for missing information places the
> >> least cognitive burden on the user.
> >
> > Fair enough. Or it could become a discussion of which technique for
> > missing information most reliably alerts the user to the cognitive
> > burden.

>

> Of course, for every query value touched with the NULL contagion to
> become NULL would even more reliably alert the user to the problem.

I agree. Received on Wed Sep 03 2003 - 15:28:10 CEST

Original text of this message