Re: 3vl 2vl and NULL

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Thu, 22 Dec 2005 01:35:22 +0100
Message-ID: <k7rjq116qei19l3hk5704lldu2id2mia9n_at_4ax.com>


On 15 Dec 2005 14:21:18 -0800, dawn wrote:

(snip)
>> This definitely improves the quality of how your 2VL logic system
>> handles missing data. But I still don't like it. :-)
>
>If you worked with it for a bit or if you had not worked with a 3VL at
>all, you would likely be happier with it.

Hi Dawn,

Yes, of course. If I had never known about 3VL, I wouldn't miss the extra control it gives. You can't miss what you're not aware of.

>
>> And you have already given the reason.
>>
>> > Now, perhaps Uncle

>> >Vernon is 110, clearly older than Aunt Marge, but that fact has not yet
>> >been recorded since people were pouring over medical and legal records
>> >to verify this first. He will then incorrectly be identified as
>> >someone whose age (in this database) is less than or equal to Marge's
>> >age. When the data are not correct in the system, the results of the
>> >queries are not correct either.
>>
>> But in this case, the data was not incorrect.
>
>Yes, it was incorrect. In order to simply, we could say that we
>modeled the proposition:
>
>Uncle Henry's age is unknown
>
>with the database data of
>
>Uncle Henry has no age

No, Dawn, we didn't model either of those propositions. We only modeled a proposition that looks like <person> is <age> years old, and omitted to include a fact for Uncle Henry in the data collection.

Think of the DB components as persons performing a task. The task is (for example) to calculate family member's ages in decades. For Aunt Marge, the person asks the data keeper for her age. He gets a reply, divides by ten, then informs his manager about the age of Aunt Marge.
For Uncle Henry, the person again asks the data keeper for his age. He doesn't get any reply (signaled, in the DB, but NULL. Of course, that only means that he can't give a reply to his manager either (again, signaled by NULL in the DB).
Real people would still communicate the fact that they can't give an answer, and start blaming others. But this fictional DB component is constrained to only return an integer to his manager, and this integer should be an age in decades. He can't return this - ergo silence in the people analogy, or NULL in the DB.

Now, when the question changes from "age in decades" to "is (s)he older than 57?", the handling of missing data changes as well. Instead of not replying to the question, the person or DB component now replies with UNKNOWN. This does not signal that Uncle Henry's age is unknown; it only signals that it is unknown if the question should be answered with "true" or "false".

One might argue that it would be more consistant to return NULL on a logical test that involves missing data as well. And though there's certainly a lot to be said for that POV (for a theorist), the use of UNKNOWN is much better for practionists like me. After all: NULLS should always propagate; UNKNOWN don't propagate. FALSE AND UNKNOWN can be FALSE, but FALSE AND NULL should be NULL.

Returning NULL instead of UNKNOWN if a logical test involves missing data would probably yield a very unworkable system. The inclusion if UNKNOWN makes for a great system that usually returns what everybody expects, sometimes surprises newbies, and give the experienced coder a great amount of control.

>
>One could say this flies in the face of a database being a set of
>facts. However, simplifications are part of modeling and this little
>white lie is a way to simplify the data processing functions and human
>understanding thereof. We are still modeling an accurate statement.
>
>> Since there's no way for a
>> database to check if data is correct or not, I can live with the
>> "garbage in garbage out" principle.
>>
>> But this case is different. The data was missing. And instead of
>> treating this data as being missing, the DB simply assumed some value
>> (apparently a zero in this case) and reported on it as if that was the
>> value I entered on the form yesterday. This would be the "nothing in
>> garbage out" principle.
>
>I understand your argument. Were I not a practioner, I would likely
>agree with your reasoning and dismiss a 2VL as over-simplified for
>reality.

This surprises me. Surely, as a practioner you would be interested in the difference between Aunt Marge being 27 years older than Billy, or Aunt Marge being 27 years old and Uncle Henry's age not beinig supplied?

(snip)
>> If the answer to both these questions is yes, you can now say "Hi" to
>> someone who has worked extensively with both and who would definitely
>> say otherwise. ;-)
>
>OK, how about if I insert words like "database access language" in
>there?

According to what definition of "database"? I've worked with IMS/DB, VSAM and flat files. I consider the first two to be databases.

> I'll also ask you if you are sure you prefer the 3VL of SQL to
>the 2VL of your programming language(s). Which language(s) are you
>using? Do you think they would be better if "enhanced" with a 3VL?

To start with the second question: I am now using Transact-SQL. Before I moved to relational databases, I used mainly PL/1, but I've also had my share of Cobol.

The other questions are harder to answer. Using a third generation language means that you have to iterate over a collection of data and process it one record at a time. If code is structured that way, it's very easy to use IF THEN ELSE constrctions to add special-casing for handling missing data. SQL changes the way code is written - instead of procedural record-oriented code, I am now writing declarative set-oriented code. The structure of that code precludes the use of complicated IF THEN ELSE constructions for special-casing the missing values.

I think that many of my queries would become longer and more complicated if SQL had no 3VL. But that was not what you're asking. And I honestly don't know if my PL/1 programs would have changed if PL/1 had offered 3VL support. At that time, I didn't know zilch about 3VL, so I never missed it. And now, it's quite hard to remember exactly what programs I wrote 10 years ago and how I would have changed them if I could have used 3VL.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Thu Dec 22 2005 - 01:35:22 CET

Original text of this message