Re: NULLs: theoretical problems?

From: David Portas <REMOVE_BEFORE_REPLYING_dportas_at_acm.org>
Date: Fri, 17 Aug 2007 21:56:33 +0100
Message-ID: <l_Cdnbfs5KQfllvbRVnytgA_at_giganews.com>


"David Cressey" <cressey73_at_verizon.net> wrote in message news:UGmxi.11$dz3.4_at_trndny01...
>
>> If those are the ONLY propositions that you intend to represent with your
>> database then you might have a point. The theoretical abhorrence arises
>> as
>> soon as we try to derive other propositions from such a database. Then
>> three-value logic means that true propositions in the real world are no
>> longer true and false ones are no longer false.
>
> In theory, you are right. In practice, it is really quite simple to
> avoid
> queries that can be tripped up by three valued logic.

That's a startling claim. For three reasons I disagree.

Firstly, annecdotal evidence suggests exactly the opposite: developers have enormous trouble with nulls. They get tripped up all the time with incorrect results in queries, either because they forgot to consider the null case or because they didn't work out the three-value predicates properly. I'm a very experienced SQL developer and I still make those kinds of mistakes too often.

Secondly, even for the perfect programmer this is demonstrably not a simple problem. Determining whether some predicate is a tautology is known to be NP-hard. Nearly all tautologies (in two-value logic terms) produce potentially incorrect results under three-value logic. For example:

SELECT DISTINCT i FROM t WHERE x < 0 OR x >=0;

is not equivalent to:

SELECT DISTINCT i FROM t;

Thirdly, even the validity of the logical operators themselves is in question under three value logic. That's because their "correct" operation actually depends on what interpretation a user attaches to a null. Yet there is no universally accepted interpretation for nulls! Codd thought there were two different meanings for null "marks" - but he was wrong. There are many more than 2 meanings and I strongly suspect that NONE of them will give good results for all possible queries. The simplest example of all, consider:

SELECT DISTINCT i FROM t WHERE x = x;

The result is clearly wrong if a null means "x is unknown" or "x is out of range" and possibly wrong if null means "x does not apply".

If you admit nulls then you must deal with predicates that reference nullable attributes. I know of no foolproof way to ensure that those predicates give the correct results.

>
> This is even true for queries that may contain some NULLs in the result
> table. Just as well, because when doing outer joins, one would come up
> with NULLS in the result table, even if the base tables were designed in
> such a way that NULL need never be permitted to be stored.
>

Outer join is only a syntax shortcut for a union. Any system that has an outer join syntax ought to have some way to specify the values used to fill in the gaps. SQL does: the COALESCE function.

-- 
David Portas
Received on Fri Aug 17 2007 - 22:56:33 CEST

Original text of this message