Re: 3vl 2vl and NULL

From: dawn <dawnwolthuis_at_gmail.com>
Date: 4 Dec 2005 11:17:00 -0800
Message-ID: <1133723820.633095.187640_at_g14g2000cwa.googlegroups.com>


David Cressey wrote:
> I've stayed away from the 3vl versus 2vl discussion over in the monster NULL
> thread. Mainly, I'm not sure exactly what Codd and Date have to say on the
> subject. I think that both of them are smarter than I am, and that they
> don't agree on the subject, and that they agree that they disagree. That
> pretty much leaves it up to us, doesn't it?
>
> Here's what I think: There are two Boolean constants: FALSE and TRUE.
> A boolean variable with a value must have one of these two values. I think
> that's the 2vl position, although I'm not sure.

I suspect thre is a good definition out there, but instead I'll render an impression that when talking about employed a 2VL, I also mean that when you do various comparisons, you end up with a T or an F. If you put an expression in an if statement, for example, you will end up with the expression evaluating to T or F and not some 3rd (or 4th ...) value.

> Any variable, including a Boolean variable, might have no value assigned,
> in which case it's missing data. Systematic treatment of missing data
> requires some kind of treatment for this case.

If you work with values as sets of values, these issues become quite easy, it seems. Your set of values for a boolean variable might be {T} or {F} or {}, the null or empty set.

> Now, since the term NULL has produced so much controversy, I'm going to use
> an alternate notation in this post. I'm going to use the marker "(no
> comment)" to indicate a missing value. Now let's go through some cases,
> in Q and A format.
>
> Q: is 4 greater than 3?
> A: TRUE
>
> Q: is 4 equal to 4?
> A: TRUE
>
> Q: is 4 equal to 3?
> A: FALSE
>
> Q: is 4 greater than (no comment)?
> A: (no comment)

Here is where you are injecting a third value where I just think it is so much easier to work with a 2VL. You do need to define some orderings, such as deciding that all sets of numbers are greater than empty sets, for example. So, of this comparison, I don't want some silly third value to come in without some great benefit to my application (I can imagine an application where there would be an advantage to using fuzzy set logic, but that isn't the average data processing 101 app).

So, my answer is
A: TRUE
>
> Q: is (no comment) equal to (no comment)?
> A: (no comment)

  1. TRUE The null set is equal to the null set

>
> Q: How much is 4 plus (no comment)?
> A: (no comment)

I'm OK with defining this this as either 4 or as not being a valid operation on this data. Returning an error is different than returning a value of "no comment".

>
> Now, (no comment) doesn't mean "either TRUE of FALSE, but I don't know
> which". It doesn't mean "neither TRUE nor FALSE but some other boolean
> constant you never defined". It doesn't mean "I know the answer, but I'm
> not telling you".
> All it means is: (no comment). Draw whatever inferences you like.
>
> Now let's try some data:
>
> Q: select spouse_first_name from employees where employee_id = 12345;
> A: (no comment)

This isn't a boolean expression, it might return an empty set

> It doesn't mean "This employee has no spouse". It doesn't mean "this
> employee has a spouse, but one with no first name". It doesn't mean "This
> guy's spouse's first name is unknown to us".
>
> All it means is that, when the author of the row filled in the values, all
> the author said about spouse_first_name was: (no comment).
>
> From here you can draw whaterver inferences you like. There may be
> inferences you can draw by agreement between the writer and the reader.

This is a good point. I work with a 2VL and you do need to have a common understanding that if you get a null set on a query like what you had above, it might be that there is no employee with that ID or it might be that the value of the spouse_first_name was empty (or the null set). While this means that in this case we don't know which is the case, users of 2VL's are not as often confused as those of a 3VL, it seems.

> In
> this sense, NULL can be defined in whatever way is most useful, as
> Marshall Spight might say. But in terms of what has been SAID , regardless
> of what it might mean, all that's been said is NULL or, in my phrasing (no
> comment).

Or a value of the empty set.

> I think this boils down to Codd's position, but I'm not sure.

Closer to Codd's than Date's IIRC. I think Date comes down to an appreciation for the simplicity of a 2VL. I don't think he uses the empty set as the mathematical metaphor (aka model) for the lack of a value, however. If I recall, he suggests various ways of handling situations where you know that there is no value for the data compared to where you don't know if there is a value, for example. I'll have to read that stuff again sometime, but I'm so satisfied that 2VL provides a better bang for the buck that I'm trying to steer clear of any 3VLs right now.

--dawn Received on Sun Dec 04 2005 - 20:17:00 CET

Original text of this message