Re: 3vl 2vl and NULL

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Fri, 13 Jan 2006 10:02:57 +0100
Message-ID: <MPG.1e318b1f10d80996989764_at_news.ntnu.no>

In article <34mds15ukofp0jgpd6nbsbqm1u2l18vbuf_at_4ax.com>, hugo_at_perFact.REMOVETHIS.info says...
>
> Codd's interpretation of NULL appears to have been changed over time. If
> you read the "Much ado about nothing" article carefully, you'll see that
> Codd's original commentary only uses the term "missing data" when
> discussing NULL; the term "Unknown" is only used as the third truth
> value.
>
> But if you then skip to "Rebutting the rebuttals", you'll see that Codd
> has adopted Date's interpretation of Null as either "Unknown" or
> "Inapplicable" - and everything starts to go downhill from there.

To suggest that Codd got his I-marks and A-marks from Date is, to put it politely, very strange.

> In this thread, I have attepmted to defend Codd's original POV that a
> NULL means nothing at all - it just indicates that the data that you
> might expect to find in a column is missing.

Now I think you are moving the goalposts. You said "I guess that our disagreements are caused by discussing two similar but not equal subjects. I was talking about the behaviour of NULL in SQL DBMSs [...]". SQL's NULL behaviour is practically identical to Codd's approach in "Extending the Database Relational Model to Capture More Meaning" (1979), where he explicitly says he is talking about null as "value at present unknown". Can you provide a source for "Codd's original POV", and show that that is the foundation for SQL?

> >
> >That's where we disagree. You can sort of salvage the predicate by
> >treating the NULLs as unknown (but existing).
>
> Indeed, you can. And that would force lots of redesigns.

...because people have been using NULL for "inapplicable" and other things, you mean? It's really not different from "forcing redesign" because someone has used -1 or 9999 for unknown, inapplicable or infinity. It may be worth it; it may not.

> Let's look at address data, for instance. In many Dutch databases, I
> have seen seperate columns for the elements of a Dutch address:
> "Address" (street, housenumber and optional suffix to the housenumber),
> "PostalCode" and "City". And there are also columns for contacts living
> abroad: "AddressLine1", "AddressLine2", and "AddressLine3". This
> satisfies the many different formats used to print address lines on
> envelopes (e.g. postal code before or after city).
>
> Of course, you should have two tables. One for Dutch addresses and one
> for foreign addresses. That's the theory. But what is the added value of
> the two-table design in a real database that is used for a real
> application. Remember that I'm not discussing the database that the
> postal service uses to pre-sort the mail; this database will probably
> use the address information ONLY to print on outgoing mail. And, maybe,
> some reports will be aggregated on City (using a COALESCE to get all
> foreign customers in the "city" "Foreign".

I don't think this is a good example, because you are mixing several different issues here; none of which are very relevant.

You don't mention what this has to do with NULLs. I can guess what you mean, but you don't need to use NULLs here to indicate that (e.g.) 'there is no AddressLine3' (whatever that means). You can use the empty string.
How you should model something depends greatly on how you are going to use it. If you are just going to print it on an envelope, a multi-line text field suffices. If you are going to compare addresses, or make detailed reports on them, or check integrity/consistency, you will benefit from more structure. This is completely orthogonal to NULLs. The model you present is not necessarily the most correct for all purposes, nor is it necessarily "the theory" that "you should have two tables".
Usability (for end user, not DBAs/programmers) have little to do with the data model---at least not in the way you seem to suggest. The address entry form the customer sees doesn't have to change even if you have two tables instead of one. In fact, I'd suggest an even better idea might be to model all kinds of national address schemes properly, and present a different form to each customers depending on the selected country. I for one is annoyed by having to enter "N/A" in required fields that are inapplicable for Norwegian addresses.

> I won't bother my customers with the extra complexity and the extra
> performance overhead of this two-table design, even though I am aware
> that I'm violating relational theory.

Hardly. Nulls are considered by some to be part of relational database theory. You *might* be violating some more or less formal database design guidelines, but I wouldn't call that "relational theory". Or do you mean normalisation?

Besides, as mentioned above, the customers don't need to notice at all. And performance may be better or worse, depending on how the database is used.

> >> >Age
> >> >----
> >> >NULL
> >> >
> >> >What does that mean?
> >>
> >> That row should not be in the projection on Age. NULL doesn't signal a
> >> "NULL age" (whatever that might be), but "no data in the Age column for
> >> this row". The project operation projects data that is there, not data
> >> that isn't there.
> >
> >Yet SQL keeps that NULL in the projection, and we are discussing how SQL
> >NULLs should be interpreted. If you interpret it as 'unknown', it makes
> >some sort of sense: There is a family member with an unknown age.
>
> There is a family member for whom the age attribute is missing.

So you are changing your mind? Does the row have an interpretation after all?

(By the way, I'd complete your statement above thus: "There is a family member for whom the age attribute is missing---in other words, we don't know what it is.")

Anyway, you don't address my point: SQL keeps the row in the result, and we are (as you agree) discussing SQL NULLs. To rephrase, the question I'm discussing is "Given how NULLs work in SQL, how should they be interpreted?". Above, you seem to start from the other end: "Given that NULLs don't mean anything, how should SQL work?".

> >> (FYI, in NIAM, there would be one predicate like I used in my reply for
> >> the model, and all ages used anywhere in that model would reference this
> >> predicate.)
> >
> >I find this a bit strange. Wouldn't that predicate be pointless, since
> >all natural numbers are ages?
>
> All natural numbers are order number too. But comparing two ages might
> make sense, comparing an age to an order number doesn't. That's what the
> NIAM model shows very clearly.

How so? Surely NIAM doesn't prevent you from comparing values from different predicates? I think such considerations are better handled by data types / domains. Doesn't NIAM separate between domains and predicates?

-- 
Jon

Received on Fri Jan 13 2006 - 10:02:57 CET

This message: [ Message body ]
Next message: Scamjunk: "newbie (relatively) - XML Compressor and XQuery doubt"
Previous message: Hugo Kornelis: "Re: 3vl 2vl and NULL"
Maybe in reply to: Frank Hamersley: "Re: 3vl 2vl and NULL"
In reply to Hugo Kornelis: "Re: 3vl 2vl and NULL"
Next in thread: B Faux: "Re: What is Pick anyway?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message