Re: NULLs: theoretical problems?

From: Christopher Browne <cbbrowne_at_acm.org>
Date: Sat, 11 Aug 2007 23:54:21 -0400
Message-ID: <87643lmmqa.fsf_at_wolfe.cbbrowne.com>


The world rejoiced as Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID> wrote:
> On Wed, 8 Aug 2007 06:43:53 -0400, sinister wrote:
>
>>Many discussions point out one deficiency of NULLs: that they collapse
>>multiple, distinct concepts into one ("no value possible," "value missing,"
>>"value not available at this time", etc).
>
> Hi sinister,
>
> The deficiency in these discussions is that they fail to normalize the
> design before starting to look at NULLs.
>
> The ANSI standard defines NULL as follows:
>
> "null value: A special value that is used to indicate the
> absence of any data value."
>
> Thus, the only concept collapsed into NULL is that of "no value here".
>
> The concepts you mention are, indeed, various reasons why a value might
> be missing. The specific reason might or might not be of interest to the
> users of a database. If it's not, there's no need to let it bother us;
> if it is, we need to properly model it.
>
> <Sidestep>
>
> How would you react if someone showed you a database model where the
> contracts table had a single column to store both termination date and
> the reason why the contract was terminated - I'm pretty sure that you'd
> send the junior DBA who made that model back to class to study
> normalisation rules, right?
>
> How then would you react if you saw a model that tries to combine the
> termination date and the reason why the contract is NOT terminated in a
> single column?
>
> </Sidestep>
>
> And yet, that is exactly what some people try to do in the discussion of
> NULLs. They see that a value might be missing (NULL) for various
> reasons, which is of course correct. They assume that the reason why the
> value is missing should be somehow stored or encoded in the database,
> which is not necessarily correct but might be requireed for some
> applications. And then they go on to state that this reason should be
> stored or encoded in the same column where the missing value should have
> been - i.e. they want to use a single column to store both the
> termination date of a terminated contract and the reason why there is no
> termination date of a (probably not terminated, though that's not even
> sure) contract. And that is of course a violation of 1NF.

Nicely put, and your example here characterizes why the "NULL problem" is reasonable to be considered a problem.

I expect that this interpretation works well enough with various of the possible interpretations of NULL to demonstrate that changing the definition of NULL (e.g. - picking a formal system to change the semantics) does not answer the underlying problem.

> Unfortunately, many database theorists, including Codd and Date, either
> failed to see this basic flaw in the discussion or deliberately
> sidestepped it for convenience. This has led to many pointless
> discussions on using A-marks and I-marks, and how that would lead to
> fourvalued logic and still not suffice, and so on....

It sounds like we're up to 21-valued logic, based on the count of how many reasons have been discovered for data to be missing :-).

It is nice to have an example that can be easily applied to them all; it suggests that perhaps the perceived problem truly is a problem :-).

> I made this same point, but in different words, about a month ago on my
> blog:
> http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/06/null-ndash-the-database-rsquo-s-black-hole.aspx

-- 
wm(X,Y):-write(X),write('_at_'),write(Y). wm('cbbrowne','gmail.com').
http://linuxdatabases.info/info/slony.html
"Although  Unix is  more reliable,  NT may  become more  reliable with
time"  --   Ron  Redman,  deputy  technical  director   of  the  Fleet
Introduction Division of the Aegis Program Executive Office, US Navy.
Received on Sun Aug 12 2007 - 05:54:21 CEST

Original text of this message