Re: 3vl 2vl and NULL

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info>
Date: Thu, 12 Jan 2006 23:50:23 +0100
Message-ID: <34mds15ukofp0jgpd6nbsbqm1u2l18vbuf_at_4ax.com>


On Mon, 26 Dec 2005 11:28:08 +0100, Jon Heggland wrote:

>In article <cjbuq15agpo8r73kqu641crkoq72nms8ht_at_4ax.com>,
>hugo_at_pe_NO_rFact.in_SPAM_fo says...
>> Hi Jon,
>>
>> I guess that our disagreements are caused by discussing two similar but
>> not equal subjects. I was talking about the behaviour of NULL in SQL
>> DBMSs; you were talking about the RM.
>
>No, not at all. I am talking about NULLs as used by SQL/Codd;
>specifically whether they mean 'unknown' or nothing at all.

Hi Jon,

Sorry for the much delayed reply.

Codd's interpretation of NULL appears to have been changed over time. If you read the "Much ado about nothing" article carefully, you'll see that Codd's original commentary only uses the term "missing data" when discussing NULL; the term "Unknown" is only used as the third truth value.

But if you then skip to "Rebutting the rebuttals", you'll see that Codd has adopted Date's interpretation of Null as either "Unknown" or "Inapplicable" - and everything starts to go downhill from there.

In this thread, I have attepmted to defend Codd's original POV that a NULL means nothing at all - it just indicates that the data that you might expect to find in a column is missing.

>
>> IMO, any model that forces the result of a join to have just one
>> predicate can't have a NULL symbol. At least not the NULL as it is
>> defined and used in the SQL model. Since NULL signals missing data, it
>> also signals that part of the predicate does not apply to a specific
>> row.
>
>That's where we disagree. You can sort of salvage the predicate by
>treating the NULLs as unknown (but existing).

Indeed, you can. And that would force lots of redesigns. You might argue that these redesigns should be done anyway. And in theory, you'd be right - but in practice, you're not.

Let's look at address data, for instance. In many Dutch databases, I have seen seperate columns for the elements of a Dutch address: "Address" (street, housenumber and optional suffix to the housenumber), "PostalCode" and "City". And there are also columns for contacts living abroad: "AddressLine1", "AddressLine2", and "AddressLine3". This satisfies the many different formats used to print address lines on envelopes (e.g. postal code before or after city).

Of course, you should have two tables. One for Dutch addresses and one for foreign addresses. That's the theory. But what is the added value of the two-table design in a real database that is used for a real application. Remember that I'm not discussing the database that the postal service uses to pre-sort the mail; this database will probably use the address information ONLY to print on outgoing mail. And, maybe, some reports will be aggregated on City (using a COALESCE to get all foreign customers in the "city" "Foreign".

I won't bother my customers with the extra complexity and the extra performance overhead of this two-table design, even though I am aware that I'm violating relational theory.

>
>> >Age
>> >----
>> >NULL
>> >
>> >What does that mean?
>>
>> That row should not be in the projection on Age. NULL doesn't signal a
>> "NULL age" (whatever that might be), but "no data in the Age column for
>> this row". The project operation projects data that is there, not data
>> that isn't there.
>
>Yet SQL keeps that NULL in the projection, and we are discussing how SQL
>NULLs should be interpreted. If you interpret it as 'unknown', it makes
>some sort of sense: There is a family member with an unknown age.

There is a family member for whom the age attribute is missing.

>
>Of course, it is possible to define a different kind of NULL that
>doesn't mean anything, and doesn't show up in projections, but that is
>not the SQL NULL.
>
>> (FYI, in NIAM, there would be one predicate like I used in my reply for
>> the model, and all ages used anywhere in that model would reference this
>> predicate.)
>
>I find this a bit strange. Wouldn't that predicate be pointless, since
>all natural numbers are ages?

All natural numbers are order number too. But comparing two ages might make sense, comparing an age to an order number doesn't. That's what the NIAM model shows very clearly.

Best, Hugo Received on Thu Jan 12 2006 - 23:50:23 CET

Original text of this message