Re: NULLs: theoretical problems?

From: David Cressey <cressey73_at_verizon.net>
Date: Sat, 18 Aug 2007 08:56:16 GMT
Message-ID: <Quyxi.46$6h3.31_at_trndny05>


"David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message news:l_Cdnbfs5KQfllvbRVnytgA_at_giganews.com...
> "David Cressey" <cressey73_at_verizon.net> wrote in message
> news:UGmxi.11$dz3.4_at_trndny01...
> >
> >> If those are the ONLY propositions that you intend to represent with
your
> >> database then you might have a point. The theoretical abhorrence arises
> >> as
> >> soon as we try to derive other propositions from such a database. Then
> >> three-value logic means that true propositions in the real world are no
> >> longer true and false ones are no longer false.
> >
> > In theory, you are right. In practice, it is really quite simple to
> > avoid
> > queries that can be tripped up by three valued logic.
>
> That's a startling claim. For three reasons I disagree.
>
> Firstly, annecdotal evidence suggests exactly the opposite: developers
have
> enormous trouble with nulls. They get tripped up all the time with
incorrect
> results in queries, either because they forgot to consider the null case
or
> because they didn't work out the three-value predicates properly. I'm a
very
> experienced SQL developer and I still make those kinds of mistakes too
> often.
>
> Secondly, even for the perfect programmer this is demonstrably not a
simple
> problem. Determining whether some predicate is a tautology is known to be
> NP-hard. Nearly all tautologies (in two-value logic terms) produce
> potentially incorrect results under three-value logic. For example:
>
> SELECT DISTINCT i FROM t WHERE x < 0 OR x >=0;
>
> is not equivalent to:
>
> SELECT DISTINCT i FROM t;
>
> Thirdly, even the validity of the logical operators themselves is in
> question under three value logic. That's because their "correct" operation
> actually depends on what interpretation a user attaches to a null. Yet
there
> is no universally accepted interpretation for nulls! Codd thought there
were
> two different meanings for null "marks" - but he was wrong. There are many
> more than 2 meanings and I strongly suspect that NONE of them will give
good
> results for all possible queries. The simplest example of all, consider:
>
> SELECT DISTINCT i FROM t WHERE x = x;
>
> The result is clearly wrong if a null means "x is unknown" or "x is out of
> range" and possibly wrong if null means "x does not apply".
>
> If you admit nulls then you must deal with predicates that reference
> nullable attributes. I know of no foolproof way to ensure that those
> predicates give the correct results.
>
> >
> > This is even true for queries that may contain some NULLs in the result
> > table. Just as well, because when doing outer joins, one would come up
> > with NULLS in the result table, even if the base tables were designed in
> > such a way that NULL need never be permitted to be stored.
> >
>
> Outer join is only a syntax shortcut for a union. Any system that has an
> outer join syntax ought to have some way to specify the values used to
fill
> in the gaps. SQL does: the COALESCE function.
>
> --
> David Portas
>
>
My own anecdotal evidence suggest that it is, in reality, quite simple, at least in the cases I have dealt with. It is entirely possible that a great many developers have a great deal of difficulty with something that is quite simple. I've seen that over and over again in computing.

> SELECT DISTINCT i FROM t WHERE x = x;

I have NEVER been tempted to code the above where I intended

SELECT DISTINCT i FROM t

The fact that they deliver different results doesn't surprise me at all. Perhaps you intended to imply some prgramming situation that is logically equivalent to the above, but not nearly so obvious. If so, could you outline what that situation is?

In situations where I've observed other programmers having trouble with 3-valued logic, it nearly always boils down to using possibly omitted values in a WHERE clause. In simple programming situations, the only selectivity required in WHERE clauses deals only with cases where there is a value to compare.

> Outer join is only a syntax shortcut for a union. Any system that has an
> outer join syntax ought to have some way to specify the values used to
fill
> in the gaps. SQL does: the COALESCE function.

That assumes that the "gaps" are places where data is missing and there "ought to be data there". Sometimes, a gap is just a gap. Hugo Kornelis has outlined an excellent summary in which the development of "-" in paper and pencil data, or NULL in an SQL system might represent merely the difference between the rectangular form of a result table, and the non rectangular form of the data being sought. If you assert that meaningful data being sought is always in rectangular form, then you and I think differently. Received on Sat Aug 18 2007 - 10:56:16 CEST

Original text of this message