Re: 3vl 2vl and NULL
Date: Fri, 09 Dec 2005 00:40:06 +0100
Message-ID: <36dhp1dv7d9r2r2ujp4qeor3cnunbsmp8m_at_4ax.com>
On 6 Dec 2005 17:59:07 -0800, michael_at_preece.net wrote:
>Look Hugo - I think we're each getting sick of this now. Please look at
Hi Mike,
Yes, I've read the article. In my opinion, both Codd and Date got it
wrong. But Date got it more wrong than Codd.
Codd's first arguments are good. In fact, he already mentions an
argument that is very close to what will be my main argument when I come
to my opinion:
'1. The meaning of the fact that a value is missing from some part or
column of a relational database is quite different from the meaning of a
value that is legitimate within that part or column.'
In Date's reaction ("Date's Rebuttal I") he writes:
'A (Codd-style) 3VL system supports just one type of null, "value
unknown"'.
Interesting here is that Date changes Codd's definition "value is
missing" to something that suits his argument better ("value unknown").
He doesn't explicitly mention this change, nor does he defend this
change with as much as one word.
>the discussion between Codd & Date, and in particular the discussion
>about "not known" and "known not".
He then presents an example (which is flawed, becuase the example
presented calls for a subtype) and concludes with:
'Joe's commission is quite likely to be misrepresented as "value
unknown" (it should of course be "value does not apply").'
After misrepresenting Codd's original argument and "proving" the need
for a second type of NULL on top of that misrepresented first type of
NULL, Date then goes on to prove that it doesn't stop here.
In Codd's reaction, he appears to have missed that Date built his
argument on a misrepresentation of Codd's words. He accepts Date's point
that "value unknown" and "value does not apply" are different and should
be represented by seperate marks. 3VL has now become 4VL. Codd seems to
think that it can stop there, but I'm afraid that Date is right: once
you start to make the distinction between different kinds of NULL,
there's no end to it.
In my opinion, Codd should have sticked to his original model. NULL is
not "value unknown", not "value does not apply". NULL is "no value".
"The age of Aunt Marge is 47 years" is a fact.
"The age of Uncle Vernon is unknown" is also a fact - but it's not the
same fact type as the former. And since they are different fact types,
Or, to put it differently, and almost in the words of Codd:
The meaning of the fact that (and the reason why) a value is missing from some part or column of a relational database is quite different from the meaning of a value that is legitimate within that part or column. And since the meanings are different, they can never be combined in the same column.
> Don't try to tell me
(snip)
I'll tell you, or anybody, how I think about this subject. If you rather not read my opinions, skip my messages or killfile me.
> Much of the discussion between
>Codd & Date (see link to article in dbdebunk in the other thread)
>concerned the *fact* that NULL can be interpreted to mean things other
>than simply "unknown".
I won't deny that NULL *can* be interpreted to mean other things than simply "no value here". Your "unknown" is one of them.
For a lot of NULLs in my tables, I know why there's no value in that cell. I know it, because I understand the real-life entities and their properties that are represented by the values in my database. But that understanding is not part of the NULL marker in the DB itself.
The value "2" in a column with datatype INTEGER can also be interpreted
in many ways. And in most cases, someone with domain knowledge will know
the correct interpretation. And yet, nobody is arguing that "2 years"
should be encoded in a different way from "2 children".
We agree that NULL *can* be interpreted to mean many things. I add to
this that the user of the database will usually (though probably not
always) know which interpretation is correct, based on domain knowledge
and/or conclusions that can be drawn from other values in the database.
But the database itself can not make any of these interpretations. For
the database, NULL should never mean anything more than "no value here".
And that fact can truthfully be represented in a column MugEmpty with
domain {'Yes', 'No'}, or in a column MugContents with a numeric domain
that stores the amount of liquid (in cc) in your mug.
> That
Exactly! That's what I've been saying all the time: NULL represents the
absence of a value. Since the question "what is in your mug" can't be
answer currently, there is no value to store in the cell that is
reserved to hold this answer. It'll be NULL - no value here. This cell
in the database is empty.
>does *not* mean I don't know what's in it. I don't have to look for any
>other facts. It is, very plainly and obviously, empty. If I have a rule
>that I only store data (known facts) in a database then I'll know that
>where I have a NULL then the NULL represents an absence of any value.
>It most certainly will not signify that I have an unknown value. It
>will signify that there is no value where one would normally be
>expected.
> It will signify that the "thing" is empty.
If "thing" refers to the cell in the database: indeed.
If "thing" refers to your mug: NO. There can be many reasons why the answer to the question "what is in your mug" is not in the database. Maybe you refused to answer. Maybe I forgot to ask. Maybe you didn't know. Maybe the mug was empty. Maybe you told me, but I forgot. Maybe you told me, but I didn't understand. Or it was top secret, and I was not allowed to store it in the database. Need I go on?
If the MugLiquid column is NULL, it signifies only that there is no
value in the DB that represents the liquid currently in the mug. If the
designer of the table thought I'd need to know *why* this value is
missing, he/she should have added one or more extra columns for this
purpose.
Unless, of course, there is already a column that can be used to find
out why this cell is NULL. If MugLiquid is NULL *and* the value of
MugContents is 0 cc, then any human user can conclude that the MugLiquid
column is NULL because the mug is empty and the question what's in it
therefore doesn't apply. And any developer can code that logic in an
application or a query to make the end user believe that the database
can conclude this as well. But the database doesn't - the developer
does.
> I will be keeping
>my database "pure", in that I will only be recording data (known
>facts).
> Any questions I ask of my database will provide absolute yes/no
>known truth - or it will simply and honestly tell me that it has
>insufficient data.
And so will mine (of course - since both DB's store the exact same collection of facts).
> If, for instance, I ask for information on countries
>with or without Weapons of Mass Destruction, there will be some
>countries - like the USA - with known-non-empty values for WMD
>attributes, some - like New Zealand - with known-empty-NULL values for
>WMD attributes, and some with no data at all - because there simply is
>no reliable information.
I have no idea what "knwon-empty-NULL values" is supposed to mean. But if you ever have to work with a SQL database, then PLEASE remember to store New Zealand's amount of nuclear warheads as 0, not as NULL. Using NULL to say that there definitely are none might be valid in Pick (I'll just have to trust you on that, as I don't know PicK), but it's absolutely not valid in an SQL table. In SQL, storing NULL means removing the fact "New Zealand has 0 nuclear warheads" from the table. That would place New Zealand -incorrectly!- in the group of countries that have no data at all (either because there is no reliable information, or for whatever other reason - I don't kinow, though the users of this DB will).
And if either you, with your Pick DB, or me, with my SQL DB, have to build a report, then we should make absolutely sure that we get the requirements straigth - does Mr. President want a list of all countries that we *know* to have WMD, or does he wish to also include all countries that *might* have WMD. Your query will look different from mine. But if we don't mess up, we'll present the same list to Mr. President.
> I make the point again - there is no place in
>a database for unknown values - and pretending otherwise will
>inevitably lead to problems, sometimes major problems.
Agreed.
> MVL is
>dangerous.
MVL is like a bread knife - dangerous in unskilled hands, but invaluable
in skilled hands.
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Fri Dec 09 2005 - 00:40:06 CET