Re: NULLs: theoretical problems?

From: JOG <jog_at_cs.nott.ac.uk>
Date: Sun, 19 Aug 2007 05:57:35 -0700
Message-ID: <1187528255.669645.66630_at_g4g2000hsf.googlegroups.com>


On Aug 19, 12:41 pm, "David Cressey" <cresse..._at_verizon.net> wrote:
> "Hugo Kornelis" <h..._at_perFact.REMOVETHIS.info.INVALID> wrote in message
>
> news:3vf9c351l07qa5jf0mui4lmccp8oetkonr_at_4ax.com...
>
>
>
> > I agree with the first statement, and disagree with the latter. Let's
> > look at an example. Here is a database without NULLs:
>
> > (Represntation #1)
>
> > SSN | Name
> > -------+---------------
> > 12345 | John Brown
> > 90163 | Kate Wilson
> > 55202 | Geoff T. Hurt
>
> > SSN | DOB
> > -------+------------
> > 12345 | 1960-08-31
> > 55202 | 1973-01-12
>
> > This is a short representation of these true propositions:
>
> > (Representation #2)
>
> > "The person identified by SSN 12345 goes by the name John Brown."
> > "The person identified by SSN 90163 goes by the name Kate Wilson."
> > "The person identified by SSN 55202 goes by the name Geoff T. Hurt."
>
> > "The person identified by SSN 12345 was born on 1960-08-31."
> > "The person identified by SSN 55202 was born on 1973-01-12."
>
> First off, Hugo, thanks for a very clear development of the topic. I hope
> I can use your example to illustrate a point I've made elsewhere without
> muddying the waters.
>
> Suppose, in the database illustrated above, one were to ask "who is the
> youngest person in the database?"
>
> It's clear from the five propositions that Geoff T. Hurt is younger than
> John Brown. But it's unknown whether Kate Wilson is or is not younger than
> Geoff T. Hurt. so, what seems like a simple query turns out to be subtler
> than we thought at first glance.
>
> Of the representations you offered, I've deliberately chosen the one that
> has no NULLs in it. What I hope to show is that the difficulty, if there is
> one, flows out of the five propositions, and not out of whether or not the
> tabular representation uses NULLs.
>
> Elsewhere in this discussion, people have questioned my assertion that
> trouble due to NULLs is simple to avoid. I am going to suggest that, if
> the trouble I've outlined here is simple to avoid, then it's equally simple
> to avoid when the two tables are collapsed into one table, with a NULL for
> the birth date of Kate Wilson.
>
> If the five propositions themselves represent an erroneous state of the
> database, then inclusion constraints, as pointed out by Jan, can prevent
> these five propositions from existing without a sixth proposition that will
> pin down Kate Wilson's birth date. This achieves the same thing that a "NOT
> NULL" constraint on all the columns in the single table representation would
> have achieved.
>
> If asking the question "who is the youngest person in the database?" was an
> unfortunate question to ask of this database in the first place, then we
> need to question how flexible the model really is. One of the shibboleths
> among SQL practitioners is that "any idiot can write a query". Maybe, maybe
> not.

Trying to ask a database something like ""who is the youngest person in the database?" is a mistake in the first place. All i can really ever ask of a database is "who is the youngest person you have a birthdate proposition about". (If this was your point then I agree).

After all the db described above is /not/ a model of people - it is a model of propositions about people (a data model). And while this model adheres to CWA it is once removed from our nomal expectations of the assumption. This sounds may sound a bit esoteric initially, but I think it is another common source of error in querying. It means that the database and its CWA can quite happily state "There exists no proposition discussing a birthdate for Kate", but never "There exists no birthdate for Kate". Received on Sun Aug 19 2007 - 14:57:35 CEST

Original text of this message