Re: NULLs: theoretical problems?

From: paul c <>
Date: Thu, 09 Aug 2007 17:43:55 GMT
Message-ID: <vnIui.42734$_d2.2251_at_pd7urf3no>

Hugo Kornelis wrote:
> On Wed, 08 Aug 2007 20:42:52 GMT, paul c wrote:

>> Hugo Kornelis wrote:
>>> ...
>>> The ANSI standard defines NULL as follows:
>>>   "null value: A special value that is used to indicate the
>>>    absence of any data value."
>>> Thus, the only concept collapsed into NULL is that of "no value here".
>>> ...

> <nitpicking>
> The ANSI text doesn't define "the null value" (as they call it) as a
> value that conveys that the value is not _a_ value, but as a _special_
> value that conveys that there is no _data_ value.
> </nitpicking>
> Personally, I never call NULL a value, and I prefer to describe NULL as
> a "marker that indicates the absence of any value". But I couldn't
> really write that and still claim to be citing ANSI, eh?
> ...

I noticed in your blog you said that the table with a null-able birthday column was in 2NF. If saying that is right, I suppose we must be careful to regard functional dependencies as determining values sometimes and non-values other times. Also, the notion of fd's depends on the notion of equality, or, if you like, identity. I vaguely remember that there are times when using SQL that when the comparison of null to some known value returns an answer of null, ie. "unknown". If that memory is accurate, then I suppose also that we must carry with us two kinds of answers to the question of equality.

I got a little side-tracked by this thread when Bob B mentioned scriptural interpretation. It didn't occur to me that he meant the interpretation of so-called holy scriptures that some people say they practice. Until later posts cleared it up, I thought it was some branch of linguistic study that I hadn't heard of!

Recent talk of "nature" and "spatio whatchamacallit" seems beside the point as far as the OP's original question is concerned. There are theories and there are theories. Some theories are just castles in the air either because we don't have practical applications for them or because we have yet discovered any. Codd's original theory had immediate practical application because among other things the manipulation of sets that have some regular form is well-suited to programming with a digital computer, ie., economical programming because of the repetitive ability of those machines, or if you like, extreme practicality. His theory included various notions such as closure on the operators he came up with that were general enough that the same operators could be applied within different applications, in the same sense that say, the quicksort algorithm, can be applied to different files in the same profitable way. Secondly, he described how the symbolic results from a computer could be interpreted by people.

The "theory" of nulls is quite different because it dictates that various exceptions must be noted in practice, which flies in the face of the repetitive advantage of the ways we program today. So I would say that on the face of it, the biggest problem with null theory is that it isn't as practical as a theory that avoids nulls. Even if an implementation "correctly" implements the various exceptions, it is still necessary for the audience or user to keep them all in mind when viewing the symbolic results.

I've worked with people who advocated the use of nulls in db's, which I thought was always a sign of slipshod design, ie., failure to nail down application requirements, whereas they would claim that it was more efficient for the application to avoid additional tables for the rows that didn't have known values for some columns and then go to great lengths to hide those null "values" from users. I found it ironic that many of the same people would insist on "normalizing" tables that contained columns such as "city", "state", "zipcode", claiming that the presence of "zipcode" made the other columns redundant in the same table. So sometimes they wanted a single table for efficiency's sake and other times they advocated that in the interest of unknown future efficiency, two tables were better. In effect, they were arguing the same notion as seen here recently that applications are "incidental".

Many people like to say that the advantage of normalization is in avoiding "anomalies", but at root, what they are really talking about is the *potential* inefficiency of redundancy. Some people claim a single source for vetting the consistency of data has to do with "correctness" but at heart, the single source is just another way of achieving efficiency.

Whereas I continue to think that it is the purpose or intent of the application that must dominate database design. So if the users of an app choose to operate in a way that denies any FD from zipcode to the other columns, the db design shouldn't reflect that. Of course, the db designers may petition two different app user groups and try to show them that there will be overall efficiencies achieved but if there is no agreement between the app users, I don't see why the db theory should be rammed down their throats. DB people in general aren't very good at explaining the efficiency of this design over that design, rather they use technocratic lingo which generally befuddles the people who are entitled to make decisions about efficiency but who rarely are able to see that it is really efficiency that is what is being talked about.

Another of the effects of Codd's theory was that certain logic was removed from individual app programs and replaced by common logic that could be shared by those apps, in particular the logic that would enforce business requirements that ran across individual applications.   That is another form of efficiency, but ultimately it is the business, ie., the collective users, who must decide how much they wish to change their habits for the sake of collective efficiency.

Sorry this was so long!

p Received on Thu Aug 09 2007 - 19:43:55 CEST

Original text of this message