Re: NULLs: theoretical problems?

From: David Cressey <cressey73_at_verizon.net>
Date: Sun, 19 Aug 2007 11:41:06 GMT
Message-ID: <m%Vxi.891$wr3.129_at_trndny04>


"Hugo Kornelis" <hugo_at_perFact.REMOVETHIS.info.INVALID> wrote in message news:3vf9c351l07qa5jf0mui4lmccp8oetkonr_at_4ax.com...
> I agree with the first statement, and disagree with the latter. Let's
> look at an example. Here is a database without NULLs:
>
> (Represntation #1)
>
> SSN | Name
> -------+---------------
> 12345 | John Brown
> 90163 | Kate Wilson
> 55202 | Geoff T. Hurt
>
> SSN | DOB
> -------+------------
> 12345 | 1960-08-31
> 55202 | 1973-01-12
>
> This is a short representation of these true propositions:
>
> (Representation #2)
>
> "The person identified by SSN 12345 goes by the name John Brown."
> "The person identified by SSN 90163 goes by the name Kate Wilson."
> "The person identified by SSN 55202 goes by the name Geoff T. Hurt."
>
> "The person identified by SSN 12345 was born on 1960-08-31."
> "The person identified by SSN 55202 was born on 1973-01-12."
>

First off, Hugo, thanks for a very clear development of the topic. I hope I can use your example to illustrate a point I've made elsewhere without muddying the waters.

Suppose, in the database illustrated above, one were to ask "who is the youngest person in the database?"

It's clear from the five propositions that Geoff T. Hurt is younger than John Brown. But it's unknown whether Kate Wilson is or is not younger than Geoff T. Hurt. so, what seems like a simple query turns out to be subtler than we thought at first glance.

Of the representations you offered, I've deliberately chosen the one that has no NULLs in it. What I hope to show is that the difficulty, if there is one, flows out of the five propositions, and not out of whether or not the tabular representation uses NULLs.

Elsewhere in this discussion, people have questioned my assertion that trouble due to NULLs is simple to avoid. I am going to suggest that, if the trouble I've outlined here is simple to avoid, then it's equally simple to avoid when the two tables are collapsed into one table, with a NULL for the birth date of Kate Wilson.

If the five propositions themselves represent an erroneous state of the database, then inclusion constraints, as pointed out by Jan, can prevent these five propositions from existing without a sixth proposition that will pin down Kate Wilson's birth date. This achieves the same thing that a "NOT NULL" constraint on all the columns in the single table representation would have achieved.

If asking the question "who is the youngest person in the database?" was an unfortunate question to ask of this database in the first place, then we need to question how flexible the model really is. One of the shibboleths among SQL practitioners is that "any idiot can write a query". Maybe, maybe not. Received on Sun Aug 19 2007 - 13:41:06 CEST

Original text of this message