Re: Codd and many-valued logics

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Mon, 6 Jun 2016 22:44:35 -0400
Message-Id: <20160606224435.30e10faaafa86addd14c3dec_at_speakeasy.net>


On Sun, 5 Jun 2016 13:03:36 +0200
Nicola <nvitacolonna_at_gmail.com> wrote:

> > I don't know Codd's reasons for the nature of his references to
> > logic.
> >
> > But the little of logic that he used hardly needs references.
> >
> > Ie a few truth tables. Eg for AND, OR, NOT & IS NULL in 3VL.
>
> Sure, Codd's treatment is self-contained. But since the role of
> classical predicate calculus is (rightly) emphasized so much, why not
> emphasize connections with existing many-valued formalisms? Maybe,
> because the foundations are not as strong?

The best answer might be found in a brief survey of SQL questions on Stack Overflow. Codd knew his users wouldn't be mathematicians or logicians. He wanted to bring some math & logic to the practice of information management, and to do so he had to make some rough-and-ready compromises with commercial reality. I'm sure most SQL users haven't heard of De Morgan and are only vaguely acquainted with Boolean logic. 3VL is a bridge too far.

I myself have arrived at what I think of as a "post-Date" compromise for a better way to deal with missing information. As I've never seen anything similar proposed, I'll take this opportunity to put it before you.

Codd proposed that Missing be divided into Unknown and Not Applicable. To my mind that's a distinction without a difference unless you're ready to grapple with 4VL, which I'm not. I suspect that normalization theory made Not Applicable not applicable anyway.

Date argues that NULLs should be prohibited and default values used instead. But that runs afoul of a primary rule in database construction: to record only what is true. A default value, once substituted for a missing one, loses it's "missingness" and cannot be distinguished from an actual value that happens to be the default one.

I argue that Missing should have direct representation in the database, but not in the logic. An attempt to use a missing value where an actual one is needed is a domain error.

Missingness is intrinsically valuable information, and people cope with missing information all the time. The database *should* record missingness. What the DBMS should not do is make implicit inferences or equivalences based on it.

The best proxy for a missing value depends on context. A missing price might be best represented by the prior known one, or an average of some known ones, or a function of the price of a related product. Or it might best be removed from the set of information under consideration. Whether the DBMS says NULL = NULL or NULL <> NULL, or something else, it *will* be wrong in some context. When the DBMS silently decides "NULL = value" yields FALSE, it imposes a meaning on the missing information that -- surprise! -- is not there. Consequently values are included or excluded (depending on negation) often as not unwittingly.

Under my rules, WHERE x IS [NOT] NULL is perfectly valid. Missing information can be reflected in the database and reported to the user. But, just as WHERE x = NULL is invalid, so too would be WHERE x = y, if y is NULL. The DBMS would raise an error and not produce any output. The user must supply an appropriate default (a la COALESCE), or explicitly exclude the missing values from the input.

Simple affordances could accomodate existing practice. The "supply an appropriate default" requirement could be met by a configuration switch like "ANSI NULLS ON" or somesuch, reducing the system to present-day ambiguity and uncertainty. But unlike today, the rigorous user would have access to a pure 2VL system, providing clearer semantics and quite likely better performance because of its simplicity.

--jkl Received on Tue Jun 07 2016 - 04:44:35 CEST

Original text of this message