Re: 3vl 2vl and NULL

From: dawn <dawnwolthuis_at_gmail.com>
Date: 9 Dec 2005 07:43:37 -0800
Message-ID: <1134143017.777948.163670_at_g44g2000cwa.googlegroups.com>


Hugo Kornelis wrote:
> On 6 Dec 2005 17:59:07 -0800, michael_at_preece.net wrote:
>
> >Look Hugo - I think we're each getting sick of this now. Please look at
> >the discussion between Codd & Date, and in particular the discussion
> >about "not known" and "known not".
>
> Hi Mike,
>
> Yes, I've read the article. In my opinion, both Codd and Date got it
> wrong. But Date got it more wrong than Codd.
>
>
> Codd's first arguments are good. In fact, he already mentions an
> argument that is very close to what will be my main argument when I come
> to my opinion:
>
> '1. The meaning of the fact that a value is missing from some part or
> column of a relational database is quite different from the meaning of a
> value that is legitimate within that part or column.'
>
>
> In Date's reaction ("Date's Rebuttal I") he writes:
>
> 'A (Codd-style) 3VL system supports just one type of null, "value
> unknown"'.
> He then presents an example (which is flawed, becuase the example
> presented calls for a subtype) and concludes with:
> 'Joe's commission is quite likely to be misrepresented as "value
> unknown" (it should of course be "value does not apply").'
>
> Interesting here is that Date changes Codd's definition "value is
> missing" to something that suits his argument better ("value unknown").
> He doesn't explicitly mention this change, nor does he defend this
> change with as much as one word.

Agreed.

> After misrepresenting Codd's original argument and "proving" the need
> for a second type of NULL on top of that misrepresented first type of
> NULL, Date then goes on to prove that it doesn't stop here.
>
>
> In Codd's reaction, he appears to have missed that Date built his
> argument on a misrepresentation of Codd's words. He accepts Date's point
> that "value unknown" and "value does not apply" are different and should
> be represented by seperate marks. 3VL has now become 4VL.

Yes, I think that was definitely a poor way for Codd to argue his point.

> Codd seems to
> think that it can stop there, but I'm afraid that Date is right: once
> you start to make the distinction between different kinds of NULL,
> there's no end to it.
>
>
> In my opinion, Codd should have sticked to his original model. NULL is
> not "value unknown", not "value does not apply". NULL is "no value".
>
> "The age of Aunt Marge is 47 years" is a fact.
> "The age of Uncle Vernon is unknown" is also a fact - but it's not the
> same fact type as the former.

I'll refer to that statement below.

> And since they are different fact types,
> they should be stored in differrent columns. Attempting to store them in
> the same column would be a violation of 1NF.
>
> Or, to put it differently, and almost in the words of Codd:
>
> The meaning of the fact that (and the reason why) a value is missing
> from some part or column of a relational database is quite different
> from the meaning of a value that is legitimate within that part or
> column. And since the meanings are different, they can never be combined
> in the same column.

I don't think this necessarily follows. You are suggesting that the meaning of the proposition is different. That would mean that the propositions should be modeled by different Relations. The "types" of your "facts" are different implies separate Relations, right?

>
>
> > Don't try to tell me
> (snip)
>
> I'll tell you, or anybody, how I think about this subject. If you rather
> not read my opinions, skip my messages or killfile me.

I'm enjoying them.

>
> > Much of the discussion between
> >Codd & Date (see link to article in dbdebunk in the other thread)
> >concerned the *fact* that NULL can be interpreted to mean things other
> >than simply "unknown".
>
> I won't deny that NULL *can* be interpreted to mean other things than
> simply "no value here". Your "unknown" is one of them.
>
> For a lot of NULLs in my tables, I know why there's no value in that
> cell. I know it, because I understand the real-life entities and their
> properties that are represented by the values in my database. But that
> understanding is not part of the NULL marker in the DB itself.

I don't have a problem with including a NULL marker, but with its interpretation in the languages of the database. It is feasible to include a null and yet use a 2VL. Sure there are issues, but not so many as with a 3VL and it is far simpler to understand and employ. When DeMorgan's law goes out the window, the data and interpretation become unnecessarily complex (for most applications and most people).

>
> The value "2" in a column with datatype INTEGER can also be interpreted
> in many ways. And in most cases, someone with domain knowledge will know
> the correct interpretation. And yet, nobody is arguing that "2 years"
> should be encoded in a different way from "2 children".
>
> We agree that NULL *can* be interpreted to mean many things. I add to
> this that the user of the database will usually (though probably not
> always) know which interpretation is correct, based on domain knowledge
> and/or conclusions that can be drawn from other values in the database.
>
> But the database itself can not make any of these interpretations. For
> the database, NULL should never mean anything more than "no value here".
>
>
> > If I look at my empty mug I can truthfully say
> >that there is an absence of any liquid in it - because it's empty.

Sortof like an empty set, eh?

> And that fact can truthfully be represented in a column MugEmpty with
> domain {'Yes', 'No'}, or in a column MugContents with a numeric domain
> that stores the amount of liquid (in cc) in your mug.
>
>
> > That
> >does *not* mean I don't know what's in it. I don't have to look for any
> >other facts. It is, very plainly and obviously, empty. If I have a rule
> >that I only store data (known facts) in a database then I'll know that
> >where I have a NULL then the NULL represents an absence of any value.
> >It most certainly will not signify that I have an unknown value. It
> >will signify that there is no value where one would normally be
> >expected.
>
> Exactly! That's what I've been saying all the time: NULL represents the
> absence of a value. Since the question "what is in your mug" can't be
> answer currently, there is no value to store in the cell that is
> reserved to hold this answer. It'll be NULL - no value here. This cell
> in the database is empty.

Or if you want to have values for every cell so that you can answer comparison questions, and if the dbms permits sets in cells, then thinking of the cell has holding an empty set would make sense, right?

>
> > It will signify that the "thing" is empty.
>
> If "thing" refers to the cell in the database: indeed.
>
> If "thing" refers to your mug: NO. There can be many reasons why the
> answer to the question "what is in your mug" is not in the database.
> Maybe you refused to answer. Maybe I forgot to ask. Maybe you didn't
> know. Maybe the mug was empty.

If I KNOW it is empty, it seems like a value that represents this knowledge might be a better approach. This is easy with a char attribute which could be coded with EMPTY or NA or NONE, but not as easy in a strongly typed environment with numeric types. That is sometimes when people get creative with all 9's or -1 or some such. Ugh.

> Maybe you told me, but I forgot. Maybe
> you told me, but I didn't understand. Or it was top secret, and I was
> not allowed to store it in the database. Need I go on?

Good list.

> If the MugLiquid column is NULL, it signifies only that there is no
> value in the DB that represents the liquid currently in the mug. If the
> designer of the table thought I'd need to know *why* this value is
> missing, he/she should have added one or more extra columns for this
> purpose.
>
> Unless, of course, there is already a column that can be used to find
> out why this cell is NULL. If MugLiquid is NULL *and* the value of
> MugContents is 0 cc, then any human user can conclude that the MugLiquid
> column is NULL because the mug is empty and the question what's in it
> therefore doesn't apply. And any developer can code that logic in an
> application or a query to make the end user believe that the database
> can conclude this as well. But the database doesn't - the developer
> does.
>
>
> > I will be keeping
> >my database "pure", in that I will only be recording data (known
> >facts).

To respond to Mike -- That makes sense and is helpful for many purposes -- the cell is missing if it has no data. If you were to reconstruct a proposition from the data, you would not have a bunch of unnecessary clauses, such as "and this date is unknown." But you then have the problem of comparing a data value in a cell to the absence of the cell at times or looking for the absense of a cell. For those purposes, it is easier to think of the mathematical model for these missing cells as having the null set as a value.

> And so will I. The structure of a SQL database, with it's fixed
> rectangular tables, will sometimes force me to use a NULL marker to
> depict that I have no facts to populate that particular cell with. But
> that's representation. In the end, both your Pick DB and my SQL DB will
> have recorded the very same facts.

Yes, and there is no need to introduce that 3rd value into the logic of the database language in order to work with it.

>
> > Any questions I ask of my database will provide absolute yes/no
> >known truth - or it will simply and honestly tell me that it has
> >insufficient data.
>
> And so will mine (of course - since both DB's store the exact same
> collection of facts).
>
>
> > If, for instance, I ask for information on countries
> >with or without Weapons of Mass Destruction, there will be some
> >countries - like the USA - with known-non-empty values for WMD
> >attributes,

I'm worried about your derived data based on this ;-)

> > some - like New Zealand - with known-empty-NULL values for
> >WMD attributes,

Zero might be a good value for that without calling in the NULL word.

> > and some with no data at all - because there simply is
> >no reliable information.

In this case when we don't know, we make up answers. Perhaps fuzzy set logic would be best for this example?

> I have no idea what "knwon-empty-NULL values" is supposed to mean. But
> if you ever have to work with a SQL database, then PLEASE remember to
> store New Zealand's amount of nuclear warheads as 0, not as NULL.

I see we agree on that.

> Using
> NULL to say that there definitely are none might be valid in Pick (I'll
> just have to trust you on that, as I don't know PicK),

no, no, I'll differ on that

> but it's
> absolutely not valid in an SQL table. In SQL, storing NULL means
> removing the fact "New Zealand has 0 nuclear warheads" from the table.
> That would place New Zealand -incorrectly!- in the group of countries
> that have no data at all (either because there is no reliable
> information, or for whatever other reason - I don't kinow, though the
> users of this DB will).
>
> And if either you, with your Pick DB, or me, with my SQL DB, have to
> build a report, then we should make absolutely sure that we get the
> requirements straigth - does Mr. President want a list of all countries
> that we *know* to have WMD, or does he wish to also include all
> countries that *might* have WMD. Your query will look different from
> mine.

Does NULL also mean "it could be something"? ;-)

> But if we don't mess up, we'll present the same list to Mr.
> President.
>
>
> > I make the point again - there is no place in
> >a database for unknown values - and pretending otherwise will
> >inevitably lead to problems, sometimes major problems.
>
> Agreed.
Agreed.
>
> > MVL is
> >dangerous.
>
> MVL is like a bread knife - dangerous in unskilled hands, but invaluable
> in skilled hands.

Not at all invaluable -- quite unnecessary, in fact. Cheers! --dawn Received on Fri Dec 09 2005 - 16:43:37 CET

Original text of this message