Re: NULLs: theoretical problems?

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sun, 12 Aug 2007 19:09:51 GMT
Message-ID: <3WIvi.57247$5j1.11101_at_newssvr21.news.prodigy.net>


"Hugo Kornelis" <hugo_at_perFact.REMOVETHIS.info.INVALID> wrote in message news:v25kb3ts633f7db30edlctt8q4eo3cc9du_at_4ax.com...
> On Wed, 8 Aug 2007 06:43:53 -0400, sinister wrote:
>
>>Many discussions point out one deficiency of NULLs: that they collapse
>>multiple, distinct concepts into one ("no value possible," "value
>>missing,"
>>"value not available at this time", etc).
>
> Hi sinister,
>
> The deficiency in these discussions is that they fail to normalize the
> design before starting to look at NULLs.
>
> The ANSI standard defines NULL as follows:
>
> "null value: A special value that is used to indicate the
> absence of any data value."
>
> Thus, the only concept collapsed into NULL is that of "no value here".
>
> The concepts you mention are, indeed, various reasons why a value might
> be missing. The specific reason might or might not be of interest to the
> users of a database. If it's not, there's no need to let it bother us;
> if it is, we need to properly model it.
>
> <Sidestep>
>
> How would you react if someone showed you a database model where the
> contracts table had a single column to store both termination date and
> the reason why the contract was terminated - I'm pretty sure that you'd
> send the junior DBA who made that model back to class to study
> normalisation rules, right?
>
> How then would you react if you saw a model that tries to combine the
> termination date and the reason why the contract is NOT terminated in a
> single column?
>
> </Sidestep>
>
> And yet, that is exactly what some people try to do in the discussion of
> NULLs. They see that a value might be missing (NULL) for various
> reasons, which is of course correct. They assume that the reason why the
> value is missing should be somehow stored or encoded in the database,
> which is not necessarily correct but might be requireed for some
> applications. And then they go on to state that this reason should be
> stored or encoded in the same column where the missing value should have
> been - i.e. they want to use a single column to store both the
> termination date of a terminated contract and the reason why there is no
> termination date of a (probably not terminated, though that's not even
> sure) contract. And that is of course a violation of 1NF.
>
> Unfortunately, many database theorists, including Codd and Date, either
> failed to see this basic flaw in the discussion or deliberately
> sidestepped it for convenience. This has led to many pointless
> discussions on using A-marks and I-marks, and how that would lead to
> fourvalued logic and still not suffice, and so on....
>

I don't think that a discussion about whether there can be a value is pointless. If it is possible that there cannot be a value for an attribute, then there is a serious structural flaw with the relation that includes that attribute. It would mean that even if every missing value were supplied, there could still be nulls! Therefore, I agree with you that a database schema should start out fully normalized under the assumption that there will be no missing values before considering whether or not to permit them. This fixes the semantics of null to be just that an applicable value has not been assigned.

If null can only mean that an applicable value has not been assigned, then 4VL becomes unnecessary, and 3VL becomes just a convenient shorthand because for finite domains, null simply represents a disjunction encompassing all of the values in the domain.

> I made this same point, but in different words, about a month ago on my
> blog:
> http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/06/null-ndash-the-database-rsquo-s-black-hole.aspx
>
> Best, Hugo
Received on Sun Aug 12 2007 - 21:09:51 CEST

Original text of this message