Re: 3vl 2vl and NULL

From: <michael_at_preece.net>
Date: 4 Dec 2005 17:57:01 -0800
Message-ID: <1133747821.754959.79980_at_g43g2000cwa.googlegroups.com>


David Cressey wrote:

> I've stayed away from the 3vl versus 2vl discussion over in the monster NULL
> thread. Mainly, I'm not sure exactly what Codd and Date have to say on the
> subject. I think that both of them are smarter than I am, and that they
> don't agree on the subject, and that they agree that they disagree. That
> pretty much leaves it up to us, doesn't it?
>
> Here's what I think: There are two Boolean constants: FALSE and TRUE.
> A boolean variable with a value must have one of these two values. I think
> that's the 2vl position, although I'm not sure.

You can be sure - it's OK to be sure about things. Known things.

>
> Any variable, including a Boolean variable, might have no value assigned,
> in which case it's missing data. Systematic treatment of missing data
> requires some kind of treatment for this case.

The crux of the matter! I think it was Codd himself who expressed a heart-felt wish that we didn't have to deal with missing values. He died with his wish unfulfilled. Pity.

Here's the thing: If you have a table with an unlimited number of rows and a limited number of columns then it is understood that the number of tuples in the table is rows*columns. True? And every tuple must have a value. And if we don't have a actual value for one or more tuples we have to use NULL to represent the fact that there is no value.

OK. There is one fundamental problem with this. Some of those NULLs in those tuples might represent that we know something is empty - so it has to be treated as a known value. All NULLs are not created equal. The NULLs that are known to be empty can be compared along with all of the other "known facts" - or, to use another word, "data". Comparing a known thing with a known thing will always result in TRUE or FALSE. 2VL.

This leaves us with the NULLs that represent missing, unknown, and unknowable "data". Let's reexamine their reason for being... They represent something which doesn't exist. They, themselves, have to exist though because we need *something* in each and every tuple - even if it represents nothing. What is in the table? A set of relations. Date actually says that a null is not a relation. Codd wished they didn't have to be there. Every row in the table must have the same number of columns - even though some of the columns in one row might be allowed to contain known-to-be-empty NULLs (because, perhaps, some data is not applicable to some rows). Taken individually, each row contains a set of related data. Data in one row is not related to data in another row in the same table, except that they share the same columns, or attributes. An actual row can be seen as its own set. The problem is that every row in the table is seen as being entirely similar - in that every row must have the same number of tuples. If we could separate out every row into its own unique and separate identity, it might just be permissable to blow away those pesky NULLs once and for all. Then, whenever we asked anything of our data, the first thing to be done would be to identify all of the rows that have known values for the required columns. Any questions asked of that subset would always return a TRUE or FALSE result.

It's only when you bring unknown values into it that mischievous MVL (multi-valued logic - *not* to be confused with multi-valued data) comes out to play.

Cheers
Mike.

>
> Now, since the term NULL has produced so much controversy, I'm going to use
> an alternate notation in this post. I'm going to use the marker "(no
> comment)" to indicate a missing value. Now let's go through some cases,
> in Q and A format.
>
> Q: is 4 greater than 3?
> A: TRUE
>
> Q: is 4 equal to 4?
> A: TRUE
>
> Q: is 4 equal to 3?
> A: FALSE
>
> Q: is 4 greater than (no comment)?
> A: (no comment)
>
> Q: is (no comment) equal to (no comment)?
> A: (no comment)
>
> Q: How much is 4 plus (no comment)?
> A: (no comment)
>
> Now, (no comment) doesn't mean "either TRUE of FALSE, but I don't know
> which". It doesn't mean "neither TRUE nor FALSE but some other boolean
> constant you never defined". It doesn't mean "I know the answer, but I'm
> not telling you".
> All it means is: (no comment). Draw whatever inferences you like.
>
> Now let's try some data:
>
> Q: select spouse_first_name from employees where employee_id = 12345;
> A: (no comment)
>
> It doesn't mean "This employee has no spouse". It doesn't mean "this
> employee has a spouse, but one with no first name". It doesn't mean "This
> guy's spouse's first name is unknown to us".
>
> All it means is that, when the author of the row filled in the values, all
> the author said about spouse_first_name was: (no comment).
>
> From here you can draw whaterver inferences you like. There may be
> inferences you can draw by agreement between the writer and the reader. In
> this sense, NULL can be defined in whatever way is most useful, as
> Marshall Spight might say. But in terms of what has been SAID , regardless
> of what it might mean, all that's been said is NULL or, in my phrasing (no
> comment).
>
> I think this boils down to Codd's position, but I'm not sure.
Received on Mon Dec 05 2005 - 02:57:01 CET

Original text of this message