Re: NULLs: theoretical problems?

From: Keith H Duggar <duggar_at_alum.mit.edu>
Date: Sat, 01 Sep 2007 11:02:43 -0700
Message-ID: <1188669763.669651.189250_at_r34g2000hsd.googlegroups.com>


Bob Badour wrote:
> Keith H Duggar wrote:
> > Bob Badour wrote:
>
> >>Keith Duggar wrote:
>
> >>>I'm not trying to have anything. I'm trying to understand what
> >>>the "write DEF" prescription buys us over say the "Interactive
> >>>Mathematical Proof System" of Farmer, Guttman, and Thayer that
> >>>has exactly the property I described that any formula is false
> >>>if any variable is NULL. Again, I am failing to grasp what the
> >>>"write DEF" prescription buys us. I would like to understand.
>
> >>>KHD
>
> >>Error detection. It forces one to acknowledge that one understands the
> >>attribute can be undefined and that one is ignoring those propositions
> >>for which it is undefined.
>
> > So one kind of error detection it allows is checking that a
> > user understands an attribute can be undefined? How helpful
> > would that be in preventing common query errors? Or does it
> > allow error detection beyond NULL understanding checks?
>
> > Thanks for help.
>
> > KHD
>
> It would be very helpful. Take my favourite example:
>
> SUM(A) + SUM(B) = SUM(A+B)
>
> The above identity does not hold in SQL.
>
> (SELECT SUM(A) FROM T) + (SELECT SUM(B) FROM T)
> does not equal (SELECT SUM(A+B) FROM T)
> because the terms are really:
>
> SELECT SUM(A) FROM T WHERE DEF T.A : TRUE
>
> SELECT SUM(B) FROM T WHERE DEF T.B : TRUE
>
> SELECT SUM(A+B) FROM T WHERE DEF T.A AND DEF T.B : TRUE
>
> Being forced to acknowledge the potentially undefined highlights the
> difference. While that is just one example of a common pitfall, I use it
> because the accountants don't like it when the numbers don't balance.
>
> To balance, one really has to write the original identity as:
> SUM(ISNULL(A,0)) + SUM(ISNULL(B,0)) = SUM(ISNULL(A,0) + ISNULL(B,0))
>
> (Presumably under Jan's model a similar ISUNDEF or COALESCE operator
> will exist.)
>
> SQL doesn't give a warning or an error. It just returns incorrect
> answers, and naive users are reassured that SQL handles missing
> information because it has NULL.
>
> While it might be tedious to do it right the first time, it costs the
> organization a whole lot more when the external auditor complains than
> when the compiler complains.

Ok, I think I'm seeing the point now. Please follow me through a hypothetical scenario to see if I get the point.

Suppose in version 1 of a database schema A and B are defined as NOT-NULL. A muggle is tasked with writing a function that a some point computes the sum you described above:

   SELECT SUM(A+B) FROM T All is well, accounts balance. Then database schema version 1.01 comes along and removes the NOT-NULL constraints on A and B. Now users might input NULL for one or both attributes.

The SQL DBMS goes merrily about it's way until one day a NULL is entered. Bang! Accounts don't balance auditors from the ministry of magical database management apparate in. Not good.

Had they used Jan's DBMS the compiler would have issued a static diagnostic along the lines of "ERROR : A and B may be undefined" forcing the maintainers to examine the code and choose one of:

   1)

   SELECT SUM(ISNULL(A,0) + ISNULL(B,0)) FROM T    2)

   SELECT SUM(A+B) FROM T WHERE DEF T.A AND DEF T.B : TRUE where under 1) accounts balance and under 2) they do not balance but the maintainers chose this (or were drinking butter beers on the job) so the DBMS happily obliges.

Is that the point?

Now for LUTINS (the logic of the "Interactive Mathematical Proof System") instead of being a static type error the runtime result of the SUM(A+B) function would be NULL. Of course similar syntax to 1) and 2) would allow you to code the desired runtime result. The key difference is static vs dynamic error detection correct?

KHD Received on Sat Sep 01 2007 - 20:02:43 CEST

Original text of this message