Re: Codd and many-valued logics

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Sat, 11 Jun 2016 20:33:17 -0400
Message-Id: <20160611203317.a308698f22f0dfd9e076de3e_at_speakeasy.net>


On Wed, 8 Jun 2016 21:20:53 +0200
Nicola <nvitacolonna_at_gmail.com> wrote:

> On 2016-06-07 02:44:35 +0000, James K. Lowden said:
>
> > On Sun, 5 Jun 2016 13:03:36 +0200
> > Nicola <nvitacolonna_at_gmail.com> wrote:
>
> I agree that compromises are necessary. There are, however, branches
> of engineering where four-valued (and even nine-valued) logics are
> routinely used. People do not seem daunted by the 4 billions or so
> dyadic truth tables (just to quote a silly argument brought against
> many-valued logics).

This is your field, not mine. I would be grateful for some pointers.

I've never heard of any branch of engineering using 4VL. Last I knew, there were still active debates over the defintion of 3VL. Because the number of operators grows exponentially, it was my understanding that there were still unexplored areas, and no consensus on the minimum essential set.

> Nowadays, the RM is well established, and a reasonable coherent
> treatment of incomplete information would be welcome (and is very
> much needed).

Hear, hear!

> > Codd proposed that Missing be divided into Unknown and Not
> > Applicable. To my mind that's a distinction without a difference
> > unless you're ready to grapple with 4VL, which I'm not.
>
> Note that Not Applicable (i.e., inexistent) does not entails
> incomplete information, while Unknown obviously does. That's quite a
> difference.

It is.

> > I suspect that normalization theory made Not Applicable not
> > applicable anyway.
>
> I believe so. Nonetheless, Not Applicable nulls allow you to use much
> less predicates (P(A,B) vs P1(A,B), P2(A), P3(B), P4()).

Yes, but. Not Applicable becomes Missing very easily. Given tuples,

        R{A}, S{A,B}
and

        Union( R, S )

by what right can we declare the missing values Not Applicable?

Mathematically, we cannot, because the distinction relies on what R and S represent to the user. So we're stuck with just Missing.

> > Missingness is intrinsically valuable information, and people cope
> > with missing information all the time. The database *should* record
> > missingness. What the DBMS should not do is make implicit
> > inferences or equivalences based on it.
>
> That depends. If such implicit inferences or equivalences are provably
> sound, why not? For example, Libkin has recently shown how to modify
> SQL's evaluation procedure so that no query on data with missing
> values ever outputs "false positives" (tuples that do not belong to
> the answer of a given query in some possible world). In some sense,
> that is the best one can hope for without compromising efficiency.

That's intriguing. I will look for Libkin's paper. If that is what he's demonstrated, I would look forward to seeing it incorprated into SQL.
> > The best proxy for a missing value depends on context. A missing
> > price might be best represented by the prior known one, or an
> > average of some known ones, or a function of the price of a related
> > product. Or it might best be removed from the set of information
> > under consideration. Whether the DBMS says NULL = NULL or NULL <>
> > NULL, or something else, it *will* be wrong in some context. When
> > the DBMS silently decides "NULL = value" yields FALSE, it imposes a
> > meaning on the missing information that -- surprise! -- is not
> > there. Consequently values are included or excluded (depending on
> > negation) often as not unwittingly.
>
> For the case of missing information nulls, Libkin's work mentioned
> above addresses exactly such problem ("SQL's three-valued logic and
> certain answers", 2016).
...
> > Simple affordances could accomodate existing practice. The "supply
> > an appropriate default" requirement could be met by a configuration
> > switch like "ANSI NULLS ON" or somesuch, reducing the system to
> > present-day ambiguity and uncertainty. But unlike today, the
> > rigorous user would have access to a pure 2VL system, providing
> > clearer semantics and quite likely better performance because of
> > its simplicity.
>
> Raising errors when the presence of nulls is not dealt with explicitly
> would likely alleviate some of the problems. But my wild guess is that
> many people would start putting "and x is (not) null" or similar
> checks everywhere more or less randomly to shut down the errors :)

IMO your wild guess is exactly right. What would be new, though, is that the user would make explicit what today is often obscure, and (I assert) quite often wrong. It only takes one unexpected NULL in a couple of levels of WHERE NOT IN to set the ship adrift.

--jkl Received on Sun Jun 12 2016 - 02:33:17 CEST

Original text of this message