Re: NULLs: theoretical problems?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sat, 18 Aug 2007 08:55:02 -0300
Message-ID: <46c6ddd4$0$4031$9a566e8b_at_news.aliant.net>


David Cressey wrote:
> "David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message
> news:l_Cdnbfs5KQfllvbRVnytgA_at_giganews.com...
>

>>"David Cressey" <cressey73_at_verizon.net> wrote in message
>>news:UGmxi.11$dz3.4_at_trndny01...
>>
>>>>If those are the ONLY propositions that you intend to represent with

>
> your
>
>>>>database then you might have a point. The theoretical abhorrence arises
>>>>as
>>>>soon as we try to derive other propositions from such a database. Then
>>>>three-value logic means that true propositions in the real world are no
>>>>longer true and false ones are no longer false.
>>>
>>>In theory, you are right.  In practice, it is really quite simple  to
>>>avoid
>>>queries that can be tripped up by three valued logic.
>>
>>That's a startling claim. For three reasons I disagree.
>>
>>Firstly, annecdotal evidence suggests exactly the opposite: developers

>
> have
>
>>enormous trouble with nulls. They get tripped up all the time with

>
> incorrect
>
>>results in queries, either because they forgot to consider the null case

>
> or
>
>>because they didn't work out the three-value predicates properly. I'm a

>
> very
>
>>experienced SQL developer and I still make those kinds of mistakes too
>>often.
>>
>>Secondly, even for the perfect programmer this is demonstrably not a

>
> simple
>
>>problem. Determining whether some predicate is a tautology is known to be
>>NP-hard. Nearly all tautologies (in two-value logic terms) produce
>>potentially incorrect results under three-value logic. For example:
>>
>>SELECT DISTINCT i FROM t WHERE x < 0 OR x >=0;
>>
>>is not equivalent to:
>>
>>SELECT DISTINCT i FROM t;
>>
>>Thirdly, even the validity of the logical operators themselves is in
>>question under three value logic. That's because their "correct" operation
>>actually depends on what interpretation a user attaches to a null. Yet

>
> there
>
>>is no universally accepted interpretation for nulls! Codd thought there

>
> were
>
>>two different meanings for null "marks" - but he was wrong. There are many
>>more than 2 meanings and I strongly suspect that NONE of them will give

>
> good
>
>>results for all possible queries. The simplest example of all, consider:
>>
>>SELECT DISTINCT i FROM t WHERE x = x;
>>
>>The result is clearly wrong if a null means "x is unknown" or "x is out of
>>range" and possibly wrong if null means "x does not apply".
>>
>>If you admit nulls then you must deal with predicates that reference
>>nullable attributes. I know of no foolproof way to ensure that those
>>predicates give the correct results.
>>
>>
>>>This is even true for queries that may contain some NULLs in the result
>>>table.  Just as well, because when doing outer joins,  one would come up
>>>with NULLS in the result table, even if the base tables were designed in
>>>such a way that NULL need never be permitted to be stored.
>>>
>>
>>Outer join is only a syntax shortcut for a union. Any system that has an
>>outer join syntax ought to have some way to specify the values used to

>
> fill
>
>>in the gaps. SQL does: the COALESCE function.
>>
>>-- 
>>David Portas
>>
>>

>
> My own anecdotal evidence suggest that it is, in reality, quite simple, at
> least in the cases I have dealt with. It is entirely possible that a great
> many developers have a great deal of difficulty with something that is quite
> simple. I've seen that over and over again in computing.
>
>
>>SELECT DISTINCT i FROM t WHERE x = x;

>
>
> I have NEVER been tempted to code the above where I intended
>
> SELECT DISTINCT i FROM t
>
> The fact that they deliver different results doesn't surprise me at all.
> Perhaps you intended to imply some prgramming situation that is logically
> equivalent to the above, but not nearly so obvious. If so, could you
> outline what that situation is?

Tautologies (and contradictions) arise often when evaluating queries against views.

> In situations where I've observed other programmers having trouble with
> 3-valued logic, it nearly always boils down to using possibly omitted
> values in a WHERE clause.

Don't you mean subordinate clauses in the WHERE clause? Like "or x is null" ?

I respectfully suggest it is pointless to argue that NULL is easy to use after Date and Darwen have written several books worth of material explaining all of the subtle gotchas. See the various _Writings...._ books. Received on Sat Aug 18 2007 - 13:55:02 CEST

Original text of this message