Re: Codd and many-valued logics
Date: Wed, 8 Jun 2016 21:20:53 +0200
Message-ID: <nj9r6l$27o2$1_at_adenine.netfront.net>
On 2016-06-07 02:44:35 +0000, James K. Lowden said:
> On Sun, 5 Jun 2016 13:03:36 +0200
> Nicola <nvitacolonna_at_gmail.com> wrote:
>
>>> I don't know Codd's reasons for the nature of his references to
>>> logic.
>>>
>>> But the little of logic that he used hardly needs references.
>>>
>>> Ie a few truth tables. Eg for AND, OR, NOT & IS NULL in 3VL.
>>
>> Sure, Codd's treatment is self-contained. But since the role of
>> classical predicate calculus is (rightly) emphasized so much, why not
>> emphasize connections with existing many-valued formalisms? Maybe,
>> because the foundations are not as strong?
>
> The best answer might be found in a brief survey of SQL questions on
> Stack Overflow. Codd knew his users wouldn't be mathematicians or
> logicians. He wanted to bring some math & logic to the practice of
> information management, and to do so he had to make some
> rough-and-ready compromises with commercial reality. I'm sure most SQL
> users haven't heard of De Morgan and are only vaguely acquainted with
> Boolean logic. 3VL is a bridge too far.
I agree that compromises are necessary. There are, however, branches of engineering where four-valued (and even nine-valued) logics are routinely used. People do not seem daunted by the 4 billions or so dyadic truth tables (just to quote a silly argument brought against many-valued logics).
There is inherent complexity in dealing with uncertain, incomplete, information, no matter how many truth values your logic has. Codd had good reasons to make certain trade-offs, in the seventies. Nowadays, the RM is well established, and a reasonable coherent treatment of incomplete information would be welcome (and is very much needed).
Just to be clear, I am not claiming that we /must/ depart from two-valued logic (although I do believe that predicate calculus is not the best tool for the job).
> I myself have arrived at what I think of as a "post-Date" compromise
> for a better way to deal with missing information. As I've never seen
> anything similar proposed, I'll take this opportunity to put it before
> you.
>
> Codd proposed that Missing be divided into Unknown and Not
> Applicable. To my mind that's a distinction without a difference unless
> you're ready to grapple with 4VL, which I'm not.
Note that Not Applicable (i.e., inexistent) does not entails incomplete information, while Unknown obviously does. That's quite a difference. In fact, it may be argued that SQL deals with the former type of nulls just fine (see Franconi and Tessaris's, On the Logic of SQL Nulls, 2012).
> I suspect that normalization theory made Not Applicable not applicable
> anyway.
I believe so. Nonetheless, Not Applicable nulls allow you to use much less predicates (P(A,B) vs P1(A,B), P2(A), P3(B), P4()).
> Date argues that NULLs should be prohibited and default values used
> instead. But that runs afoul of a primary rule in database
> construction: to record only what is true. A default value, once
> substituted for a missing one, loses it's "missingness" and cannot be
> distinguished from an actual value that happens to be the default one.
>
> I argue that Missing should have direct representation in the database,
> but not in the logic. An attempt to use a missing value where an
> actual one is needed is a domain error.
It makes sense to me.
> Missingness is intrinsically valuable information, and people cope with
> missing information all the time. The database *should* record
> missingness. What the DBMS should not do is make implicit inferences
> or equivalences based on it.
That depends. If such implicit inferences or equivalences are provably sound, why not? For example, Libkin has recently shown how to modify SQL's evaluation procedure so that no query on data with missing values ever outputs "false positives" (tuples that do not belong to the answer of a given query in some possible world). In some sense, that is the best one can hope for without compromising efficiency.
> The best proxy for a missing value depends on context. A missing price
> might be best represented by the prior known one, or an average of some
> known ones, or a function of the price of a related product. Or it
> might best be removed from the set of information under consideration.
> Whether the DBMS says NULL = NULL or NULL <> NULL, or something else, it
> *will* be wrong in some context. When the DBMS silently decides "NULL =
> value" yields FALSE, it imposes a meaning on the missing information
> that -- surprise! -- is not there. Consequently values are included or
> excluded (depending on negation) often as not unwittingly.
For the case of missing information nulls, Libkin's work mentioned above addresses exactly such problem ("SQL's three-valued logic and certain answers", 2016).
> Under my rules, WHERE x IS [NOT] NULL is perfectly valid. Missing
> information can be reflected in the database and reported to the user.
> But, just as WHERE x = NULL is invalid, so too would be WHERE x = y, if
> y is NULL. The DBMS would raise an error and not produce any output.
> The user must supply an appropriate default (a la COALESCE), or
> explicitly exclude the missing values from the input.
>
> Simple affordances could accomodate existing practice. The "supply an
> appropriate default" requirement could be met by a configuration switch
> like "ANSI NULLS ON" or somesuch, reducing the system to present-day
> ambiguity and uncertainty. But unlike today, the rigorous user would
> have access to a pure 2VL system, providing clearer semantics and quite
> likely better performance because of its simplicity.
Raising errors when the presence of nulls is not dealt with explicitly would likely alleviate some of the problems. But my wild guess is that many people would start putting "and x is (not) null" or similar checks everywhere more or less randomly to shut down the errors :)
Nicola
- news://freenews.netfront.net/ - complaints: news_at_netfront.net ---
