Re: NULLs: theoretical problems?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 01 Sep 2007 17:40:14 -0300
Message-ID: <46d9cde1$0$4035$9a566e8b_at_news.aliant.net>


Keith H Duggar wrote:

> Bob Badour wrote:
>

>>Keith H Duggar wrote:
>>
>>>Bob Badour wrote:
>>
>>>>Keith Duggar wrote:
>>
>>>>>I'm not trying to have anything. I'm trying to understand what
>>>>>the "write DEF" prescription buys us over say the "Interactive
>>>>>Mathematical Proof System" of Farmer, Guttman, and Thayer that
>>>>>has exactly the property I described that any formula is false
>>>>>if any variable is NULL. Again, I am failing to grasp what the
>>>>>"write DEF" prescription buys us. I would like to understand.
>>
>>>>>KHD
>>
>>>>Error detection. It forces one to acknowledge that one understands the
>>>>attribute can be undefined and that one is ignoring those propositions
>>>>for which it is undefined.
>>
>>>So one kind of error detection it allows is checking that a
>>>user understands an attribute can be undefined? How helpful
>>>would that be in preventing common query errors? Or does it
>>>allow error detection beyond NULL understanding checks?
>>
>>>Thanks for help.
>>
>>>KHD
>>
>>It would be very helpful. Take my favourite example:
>>
>>SUM(A) + SUM(B) = SUM(A+B)
>>
>>The above identity does not hold in SQL.
>>
>>(SELECT SUM(A) FROM T) + (SELECT SUM(B) FROM T)
>>does not equal (SELECT SUM(A+B) FROM T)
>>because the terms are really:
>>
>>SELECT SUM(A) FROM T WHERE DEF T.A : TRUE
>>
>>SELECT SUM(B) FROM T WHERE DEF T.B : TRUE
>>
>>SELECT SUM(A+B) FROM T WHERE DEF T.A AND DEF T.B : TRUE
>>
>>Being forced to acknowledge the potentially undefined highlights the
>>difference. While that is just one example of a common pitfall, I use it
>>because the accountants don't like it when the numbers don't balance.
>>
>>To balance, one really has to write the original identity as:
>>SUM(ISNULL(A,0)) + SUM(ISNULL(B,0)) = SUM(ISNULL(A,0) + ISNULL(B,0))
>>
>>(Presumably under Jan's model a similar ISUNDEF or COALESCE operator
>>will exist.)
>>
>>SQL doesn't give a warning or an error. It just returns incorrect
>>answers, and naive users are reassured that SQL handles missing
>>information because it has NULL.
>>
>>While it might be tedious to do it right the first time, it costs the
>>organization a whole lot more when the external auditor complains than
>>when the compiler complains.

>
>
> Ok, I think I'm seeing the point now. Please follow me through a
> hypothetical scenario to see if I get the point.
>
> Suppose in version 1 of a database schema A and B are defined as
> NOT-NULL. A muggle is tasked with writing a function that a some
> point computes the sum you described above:
>
> SELECT SUM(A+B) FROM T
>
> All is well, accounts balance. Then database schema version 1.01
> comes along and removes the NOT-NULL constraints on A and B. Now
> users might input NULL for one or both attributes.
>
> The SQL DBMS goes merrily about it's way until one day a NULL is
> entered. Bang! Accounts don't balance auditors from the ministry
> of magical database management apparate in. Not good.
>
> Had they used Jan's DBMS the compiler would have issued a static
> diagnostic along the lines of "ERROR : A and B may be undefined"
> forcing the maintainers to examine the code and choose one of:
>
> 1)
>
> SELECT SUM(ISNULL(A,0) + ISNULL(B,0)) FROM T
>
> 2)
>
> SELECT SUM(A+B) FROM T WHERE DEF T.A AND DEF T.B : TRUE
>
>
> where under 1) accounts balance and under 2) they do not balance
> but the maintainers chose this (or were drinking butter beers on
> the job) so the DBMS happily obliges.
>
> Is that the point?

Basically. However, depending on the requirements, option 2) could be correct provided the maintainers also update all the other reports that separately use SUM(A) and SUM(B) to use the same condition. e.g.:

SELECT SUM(A) FROM T WHERE DEF T.A AND DEF T.B: TRUE SELECT SUM(B) FROM T WHERE DEF T.A AND DEF T.B: TRUE
> Now for LUTINS (the logic of the "Interactive Mathematical Proof
> System") instead of being a static type error the runtime result
> of the SUM(A+B) function would be NULL. Of course similar syntax
> to 1) and 2) would allow you to code the desired runtime result.
> The key difference is static vs dynamic error detection correct?
>
> KHD
IOW, you are suggesting a consistent SUM--unlike SQL--that correctly sums to UNKNOWN when any term is missing. Yes, in that case the key difference is static versus dynamic checking. Compared to SQL, both of those options are automatic versus manual/no checking.

One could also argue that Jan's proposal limits the locus of effect. A null could propagate throughout a whole system before anyone notices it, in which case it could prove time-consuming to locate the origin.

I assume a proof system would also automatically detect or require constraints like "x != 1" in:

lim [(x^2 - 1) / (x - 1)], x != 1
x->1

or more simply

f(x) = [(x^2 - 1) / (x - 1)], x != 1 Received on Sat Sep 01 2007 - 22:40:14 CEST

Original text of this message