Re: 3vl 2vl and NULL

From: dawn <dawnwolthuis_at_gmail.com>
Date: 10 Dec 2005 05:22:38 -0800
Message-ID: <1134220958.888015.324200_at_g47g2000cwa.googlegroups.com>

Hugo Kornelis wrote:
> On 9 Dec 2005 07:43:37 -0800, dawn wrote:
> (snip)
> >> "The age of Aunt Marge is 47 years" is a fact.
> >> "The age of Uncle Vernon is unknown" is also a fact - but it's not the
> >> same fact type as the former.
<snip>
> >I don't have a problem with including a NULL marker, but with its
> >interpretation in the languages of the database. It is feasible to
> >include a null and yet use a 2VL. Sure there are issues, but not so
> >many as with a 3VL and it is far simpler to understand and employ.
>
> Are you sure?
>
> "Show me all family members who are older than Aunt Marge" - should
> Uncle Vernon be included or not? Why?

No. We don't know Uncle Vernon's age and we can only work with the facts we know.

> "Show me all family members who are NOT older than Aunt Marge" - should
> Uncle Vernon be included or not? Why?

Two choices of how to ask the question -- 1) Show me all members who have an age where that age is less than or equal to Aunt Marge's age

In this case Uncle Vernon does not show up because he does not have an age

2) Show me all family members who do not have an age greater than Aunt Marge's

In this case Uncle Vernon does show up

> "Show me all family members who are older than Uncle Vernon" - should
> this return noone, everyone, or only some? Why?

Here we are asking a question of comparison to a fact we do not have, right? So it is a bad question that the user should recognize when they get the results and can debug it from there. It is a garbage-in, garbage-out issue. The response from the query language could be all family members whose ages are not null. That is how the implementation I use would handle it. The null sets all equal to Uncle Vernon's null set for that data.

>
> In all these cases, you run into the problem that Uncle Vernon's age is
> not in the database.

Yes, you are doing comparisons against a lack of a fact, so to use a two-valued logic, the question is changed to:

How does something compare to the fact (since we can only work with facts and T/F with 2VL) that we have no age for Uncle Vernon.

> With or without databases, fact remains is that you
> don't know if Aunt Marge is older, younger or the same age as Uncle
> Vernon.

That is correct. We don't know and a 2VL needs to answer based on what it does know. So, give it your best shot based on the data you have, not the data you don't.

> This proves that, as soon as you allow values to be missing in
> your database, True and False are no longer the only possible results of
> a predicate.

No it doesn't. We can still define a 2VL on this data as indicated.

> Reality forces 3VL on you.

Why not 4VL or greater?

We need not have the database responses model all human responses. People understand a T/F response and intuitively understand that the computer cannot know everything and has to do what it can with the questions asked.

This analogy is a little far off the path, but it is what popped into mind and does help explain my point. The Apple Newton developers worked very hard on hand-writing recognition, but failed to a large extent in having satisfied users related to the handwriting recognition feature. The palm pilot told humans how to make their letters and had much greater success with this data entry from a pen approach.

Humans understand the limitations of a computer and can roll with it. Trimming to a 2VL is really quite easy for end-users and developers to work with. It isn't without issues, but seems to have fewer than a 3VL has.

>
> Using 2VL might be simpler to understand and employ, but it can't
> properly depict all facets of reality.

And a 3VL can?

Another angle -- think of how the RM folks reply to my interest in having list constructs in the dbms. Surely there are ordered and unordered lists "in reality." But their point is that if you can contrive a list through the RM by adding more tables, ordering attributes, and writing your own inserts into those lists to renumber items after that point, you can get the job done without adding more to the language. In the case of lists, I agree this is possible, but think it is a shame for the language not to do this for me.

We then reverse our positions on the 3VL vs 2VL where my take is that you can get the job done with a 2VL and it is simpler than a 3VL, so let's go with it.

What is consistent in my position on these two matters? In both cases, I choose the approach that is easiest for the humans involved, whether ot not it is the simplest mathematical model. These are human-computer interaction questions to me. I don't care what goes on under the covers (everything for the human, including performance, being equal) but want the person working with the database tools/languages/api to be as productive as possible.

> I prefer a model that is faithful
> to the reality I try to model over a model that's easy but wrong.

I know that the database itself at any point in time is not modeling all of reality. Not only can we trim back on facts (not recording every known fact) but on the way we process the facts by using a 2VL, for example. The dbms is a model of only some aspects of reality. If you were to list all possible human responses to the questions you list above, there are many we do not want to model in the computer. For example

Why are you asking that?
But Uncle Vernon died yesterday (or even last year).
I don't know (even if all facts are known)
I could use a glass of water etc.

I think I can safely claim that we do not want to model all aspects of reality. We want to simplify things and a 2VL does that quite nicely.

> >When DeMorgan's law goes out the window, the data and interpretation
> >become unnecessarily complex (for most applications and most people).
>
> I must be missing something. How exactly does DeMorgan's law cease to
> apply with 3VL?

I'm not an expert here, but I believe you have to enhance ~(p ^ q) --> ~p v ~q

to add nulls into the mix for a 3VL. Otherwise,

Uncle Vernon's age is unknown, Aunt Marge is 47 p=Uncle Vernon is over 40
q=Aunt Marge is over 40

p is neither T nor F so we don't have both p and q being T so we do have
~(p ^ q) but we don't have the implication being true

Even if we can have an enhanced De Morgan's law for 3VL, it is not as intuitive to a human (OK, me!) who would think that if the first part is the case, then the implication would follow.

Did that make sense? If not, I'll admit that I'm not thinking this all the way through right now, so if someone else wants to shed light on this or explain why I might "feel" like De Morgan's law as I expect it to work goes out the window in a 3VL please jump in.

> (snip)
> >> > If I look at my empty mug I can truthfully say
> >> >that there is an absence of any liquid in it - because it's empty.
> >
> >Sortof like an empty set, eh?
> >
> >> And that fact can truthfully be represented in a column MugEmpty with
> >> domain {'Yes', 'No'}, or in a column MugContents with a numeric domain
> >> that stores the amount of liquid (in cc) in your mug.
> >>
> >>
> >> > That
> >> >does *not* mean I don't know what's in it. I don't have to look for any
> >> >other facts. It is, very plainly and obviously, empty. If I have a rule
> >> >that I only store data (known facts) in a database then I'll know that
> >> >where I have a NULL then the NULL represents an absence of any value.
> >> >It most certainly will not signify that I have an unknown value. It
> >> >will signify that there is no value where one would normally be
> >> >expected.
> >>
> >> Exactly! That's what I've been saying all the time: NULL represents the
> >> absence of a value. Since the question "what is in your mug" can't be
> >> answer currently, there is no value to store in the cell that is
> >> reserved to hold this answer. It'll be NULL - no value here. This cell
> >> in the database is empty.
> >
> >Or if you want to have values for every cell so that you can answer
> >comparison questions, and if the dbms permits sets in cells, then
> >thinking of the cell has holding an empty set would make sense, right?
>
> *IF* the database allowed to store a set in a cell, then it might make
> sense to store an empty set for an empty mug.

I'll take that, thanks. Now move away from what the database allows us to store to the mathematical model for what is stored. You can model the scalar as a set with one value and the NULL as a null set. I realize this isn't the RM model for it.

> And it would also make
> sense to store the contents of my mug as {Coffee}, but the contents of
> my wife's mug as {Coffee, Sugar}.

Yes.

> In SQL databases, sets can't be stored in a cell.

I noticed that.

> If the complete set of
> contents of the mug has to be stored, the first rule of normalisation
> requires us to use a seperate table.

Unless you are using Date's new, revised version of 1NF as discussed in threads earlier this year. But I use your def of 1NF and continue to be opposed to 1NF.

> My wife's mug would result in two
> rows.

Such a shame. Sometimes the data modeler will either simplify reality and not capture all of the contents or perhaps add in attributes for hasSugar, hasCream, hasIrishCream, ... to the first table rather than making a second one with multiple rows for cup contents. Wouldn't it be great if we could model this with a list in a cell? ;-)

> My mug in one row. And Mike's empty mug would get no rows at all
> in this table, since that's how SQL databases store an empty set.
>
> But the original discussion was about NULL being used to represent a
> missing *scalar*.

The 2VL I use does use null to represent a missing scalar, but, again, a mathematical model that could be used to explain how it works could model the scalars as sets of a single value and the null as an empty set.

> >> > It will signify that the "thing" is empty.
> >>
> >> If "thing" refers to the cell in the database: indeed.
> >>
> >> If "thing" refers to your mug: NO. There can be many reasons why the
> >> answer to the question "what is in your mug" is not in the database.
> >> Maybe you refused to answer. Maybe I forgot to ask. Maybe you didn't
> >> know. Maybe the mug was empty.
> >
> >If I KNOW it is empty, it seems like a value that represents this
> >knowledge might be a better approach. This is easy with a char
> >attribute which could be coded with EMPTY or NA or NONE, but not as
> >easy in a strongly typed environment with numeric types.
>
> All these values (EMPTY, NA, NONE) are not in the domain of valid
> liquids. They should therefore be disallowed in this column (either by a
> CHECK constraint or by a FOREIGN KEY constraint).

If it is valid to know that a cup has zero liquid in it, then surely 0 could be in the domain of a liquidAmount attribute. Similarly, if EMPTY is a possible value for whatever attribute you identify, then that needs to be in the domain of the attribute, right? That is what a domain is -- the list of possible values.

> (snip)
> >> Using
> >> NULL to say that there definitely are none might be valid in Pick (I'll
> >> just have to trust you on that, as I don't know PicK),
> >
> >no, no, I'll differ on that
>
> Since I don't know Pick, I'll have to believe everything Mike says about
> it. I'll also have to believe everything you say about it. With the
> interesting result that I am now totally believeing two mutually
> exclusive statements about Pick. <g>

You might have noticed that various RM folks disagree at times too, right?

In this case, Mike is right that it is possible to let NULL for a particular attribute mean that we know the value and it is "none."
>From a data modeling standpoint, that would be frowned upon (thus my
"no" response). I'm sure that is done whenever the modeler deems it unnecessary to distinquish between a "don't know" and "do know that there is none" but I would discourage it.

Cheers! --dawn Received on Sat Dec 10 2005 - 14:22:38 CET

This message: [ Message body ]
Next message: JOG: "Re: So what's null then if it's not nothing?"
Previous message: vc: "Re: So what's null then if it's not nothing?"
In reply to Hugo Kornelis: "Re: 3vl 2vl and NULL"
Next in thread: JOG: "Re: So what's null then if it's not nothing?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message