Re: 3vl 2vl and NULL

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Mon, 12 Dec 2005 00:41:50 +0100
Message-ID: <uqapp1d6k64nmg2daqjuc9e1bk76qrq2tp_at_4ax.com>


On 10 Dec 2005 05:22:38 -0800, dawn wrote:

(snip)
>> "Show me all family members who are NOT older than Aunt Marge" - should
>> Uncle Vernon be included or not? Why?
>
>Two choices of how to ask the question --
>1) Show me all members who have an age where that age is less than or
>equal to Aunt Marge's age
>
>In this case Uncle Vernon does not show up because he does not have an
>age
>
>2) Show me all family members who do not have an age greater than Aunt
>Marge's
>
>In this case Uncle Vernon does show up

Hi Dawn,

So the results of the query would depend on exactly how you formulate it?

Hmmm. AFAICT, this means that in your 2VL algebra, missing data results in ~(A) not being the same as (~A).

I don't think that I agree with your assessment of this 2VL being simpler than SQL's 3VL.

>
>> "Show me all family members who are older than Uncle Vernon" - should
>> this return noone, everyone, or only some? Why?
>
>Here we are asking a question of comparison to a fact we do not have,
>right? So it is a bad question that the user should recognize when
>they get the results and can debug it from there.

Agreed. But you can't blame it on user input in all cases. What if I want a count of "older" family members for each member of the family in my report? I can't say I'm completely happy with the answer SQL would give me (the answer would be 0 - the number of family members known for sure to be older than Uncle Vernon, whereas I woould prefer NULL {ie. no answer at all} in this case).

(snip)
>> With or without databases, fact remains is that you
>> don't know if Aunt Marge is older, younger or the same age as Uncle
>> Vernon.
>
>That is correct. We don't know and a 2VL needs to answer based on what
>it does know. So, give it your best shot based on the data you have,
>not the data you don't.

So it makes an assumption, right? And it gives you no way to distinguish (known-to-be) False and (knwon-to-be) True from (assumed-to-be) False and (assumed-to-be) True?

>
>> This proves that, as soon as you allow values to be missing in
>> your database, True and False are no longer the only possible results of
>> a predicate.
>
>No it doesn't. We can still define a 2VL on this data as indicated.
>
>> Reality forces 3VL on you.
>
>Why not 4VL or greater?

What would the fourth value be?

Note that I'm not talking about probabilities (you could redefine this as infinite-valued logic, I guess <g>) or aboout fuzzy logic. I'm talking about basic yes-no predicates, that would evaluate to either True or False if none of the values referenced in the predicates are missing.

Allowing values to go missing forces you to either make assumptions (which is IMO quite dangerous, since you can't distinguish between known and assumed results) or accept a thrid truth value of Unknown. I fail to see how a fourth value would be necessary.

(snip)
>Trimming to a 2VL is really quite easy for end-users and developers to
>work with. It isn't without issues, but seems to have fewer than a 3VL
>has.

It's not about the number of issues, but about their severity. I have a feeling that at the end of this discussion, we're going to agree to disagree on this. :-)

>
>>
>> Using 2VL might be simpler to understand and employ, but it can't
>> properly depict all facets of reality.
>
>And a 3VL can?

WRT evaluating predicates with missing values: yes. At least that's what I believe.

>
>Another angle -- think of how the RM folks reply to my interest in
>having list constructs in the dbms. Surely there are ordered and
>unordered lists "in reality." But their point is that if you can
>contrive a list through the RM by adding more tables, ordering
>attributes, and writing your own inserts into those lists to renumber
>items after that point, you can get the job done without adding more to
>the language. In the case of lists, I agree this is possible, but
>think it is a shame for the language not to do this for me.

I don't read all discussions in this group. I'm not one of the people who claim that either SQL or "true" relational is the answer to all questions. Ordered lists are definitely a weak spot in SQL; if you have to use them a lot, a SQL database should not be your choice.

But I don't think that ordered list support shoudl be added to RM either. It doesn't combine well with how RM is structured. My point is to leave them out, and accept that mimicking an ordered list in RM required some hard work

</offtopic>

(snip)
>> >When DeMorgan's law goes out the window, the data and interpretation
>> >become unnecessarily complex (for most applications and most people).
>>
>> I must be missing something. How exactly does DeMorgan's law cease to
>> apply with 3VL?
>
>I'm not an expert here, but I believe you have to enhance
>~(p ^ q) --> ~p v ~q
>
>to add nulls into the mix for a 3VL.

There are no nulls in 3VL. Only True, False, and Unknown. Unknown is not the same as NULL - Unknown is a valid value in the 3VL domain and NULL is a special marker to denote the absence of any valid value.

Unknown signifies that we tried to evaluate the predicate, but found some of the values needed missing. As a result, we couldn't determine wether the predicate is true or false.

NULL signifies that, for whatever reason, we never even tried to evaluate the predicate.

I populated both of DeMorgan's laws with all nine possible combinations of True, False, and Unknown for p and q. It held for all possibilities. So either I made a mistake, or DeMorgan's laws need not be enhanced for 3VL.

> Otherwise,
>
>Uncle Vernon's age is unknown, Aunt Marge is 47
>p=Uncle Vernon is over 40
>q=Aunt Marge is over 40

Okay, I'll use this example.

Since Uncle Vernon's age is not in the database, p evaluates to Unknown. Since Aunt Marge is 47, q evaluates to True.

~(p ^ q) = ~(Unknown ^ True) = ~(Unknown) = Unknown

~p v ~q = ~Unknown v ~True = Unknown v False = Unknown

In this example, ~(p ^ q) is equal to ~p v ~q. DeMorgan's law holds.

>
>p is neither T nor F so we don't have both p and q being T so we do
>have
>~(p ^ q) but we don't have the implication being true

I'm not sure what this paragraph means.

>
>Even if we can have an enhanced De Morgan's law for 3VL, it is not as
>intuitive to a human (OK, me!) who would think that if the first part
>is the case, then the implication would follow.

What "first part" do you mean? And what "implication"?

(snip)
>> *IF* the database allowed to store a set in a cell, then it might make
>> sense to store an empty set for an empty mug.
>
>I'll take that, thanks. Now move away from what the database allows us
>to store to the mathematical model for what is stored. You can model
>the scalar as a set with one value and the NULL as a null set. I
>realize this isn't the RM model for it.

Agreed. I think I already aggreed to that in my previous message, but maybe I didn;t express myself clear enough.

(snip)
>> If the complete set of
>> contents of the mug has to be stored, the first rule of normalisation
>> requires us to use a seperate table.
>
>Unless you are using Date's new, revised version of 1NF as discussed in
>threads earlier this year. But I use your def of 1NF and continue to
>be opposed to 1NF.

I normally use DKNF or at least ONF. Of course, 1NF is implied in those.

As a Pickie, I didn't expect you to suddenly embrace the first rule of normalization ;->

>
>> My wife's mug would result in two
>> rows.
>
>Such a shame.

Why? What difference does it make to the end user if the contents of his mug are stored in one table or spread out over two (oor even more) of them? Those are implementation details that should be hidden from the end user.

> Sometimes the data modeler will either simplify reality
>and not capture all of the contents or perhaps add in attributes for
>hasSugar, hasCream, hasIrishCream, ... to the first table rather than
>making a second one with multiple rows for cup contents.

Sure, there are lots of lousy modelers in the world. I don't think that this problem is limited to RM.

> Wouldn't it
>be great if we could model this with a list in a cell? ;-)

I don't think so - because adding that possibility to RM would cause RM to stop being RM and start being Pick. There's no need to transform RM to Pick, because Pick already exists. ;-)

I won't throw myself in the "which-is-better-RM-or-Pick" flamewars. Since I never have worked with Pick, I simply don't know which is better. But I assume that each of them is better at some tasks, and inferior at other tasks. And that there are also situations where flat-file storage would beat them both.

(snip)
>If it is valid to know that a cup has zero liquid in it, then surely 0
>could be in the domain of a liquidAmount attribute. Similarly, if
>EMPTY is a possible value for whatever attribute you identify, then
>that needs to be in the domain of the attribute, right? That is what a
>domain is -- the list of possible values.

Sure, you could include EMPTY in the domain for Mug.LiquidType, but then you shouldn't label the domain as "Liquid types", but as "Liquid types or EMPTY" or something like that. And you shouldn't create a FOREIGN KEY constraint to the LiquidTypes table, because EMPTY should definitely not be included in that table.

Oh, and you should not forget to change the proposition for the column, since "This mug contains empty" doesn't make much sense. (Yes, I know, you can get around that by using "NOTHING" instead of "EMPTY").

Storing the LiquidType of an empty mug as NULL makes life so much easier!

>
>> (snip)
>> >> Using
>> >> NULL to say that there definitely are none might be valid in Pick (I'll
>> >> just have to trust you on that, as I don't know PicK),
>> >
>> >no, no, I'll differ on that
>>
>> Since I don't know Pick, I'll have to believe everything Mike says about
>> it. I'll also have to believe everything you say about it. With the
>> interesting result that I am now totally believeing two mutually
>> exclusive statements about Pick. <g>
>
>You might have noticed that various RM folks disagree at times too,
>right?

Definitely! I myself disagree with many of the other RM folks when the subject of NULLs comes up. Of course, they are all wrong and I am the only one who sees the Truth As It Really Is, but that's a common problem for geniuses - only after they die are they understood.

</irony>

>
>In this case, Mike is right that it is possible to let NULL for a
>particular attribute mean that we know the value and it is "none."
>>From a data modeling standpoint, that would be frowned upon (thus my
>"no" response). I'm sure that is done whenever the modeler deems it
>unnecessary to distinquish between a "don't know" and "do know that
>there is none" but I would discourage it.
>
>Cheers! --dawn

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Mon Dec 12 2005 - 00:41:50 CET

Original text of this message