Re: Proposal: 6NF

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Tue, 10 Oct 2006 01:18:58 +0200
Message-ID: <0mjli2da78qm4fq1pmvtcc6pufdnpqrsa0_at_4ax.com>


On 7 Oct 2006 11:03:23 -0700, Cimode wrote:

>
>David Cressey wrote:

(snip)
>> Where I started from was that a DBMS should have a "systematic treatment of
>> missing data". That doesn't necessarily imply that SQL's treatment is a
>> good one, or that, if it is a good one, a better one cannot be devised. I
>> prefer SQL's treatment
>> to several alternatives I've seen, including the alternative of forbidding
>> all missing data, and the alternative of treating NULL as a special value.
>My advice is to dump *politically correct* approaches. SQL NULL is not
>a better treatment of missing data. It is poor (in a word you don't
>have to choose between pest and cholera).

Hi Cimode,

Since "better" and "poor" are subjective until everyone agrees on a set of verifiable criteria, this is IMO an opinion, not a fact.

Now, "systematic" is a verifiable criterium. Correct me if I'm wrong, but I believe that to prove something not systematic, one has to prove that at least one of the two following cases holds: a) the outcome of a scenario can't be predicted wiith 100% certainty, or b) similar cases are treated dissimilar.

A lot can be (and has been) said about how SQL uses NULL and 3VL to handle missing data. I agree with some of those sentiments, and I disagree with some others. But I have yet to see evidence that the way SQL uses NULL and 3VL to handle missing data leads to unpredictable results or to dissimilar treatment of similar cases.

It is very clear that you dislike NULLS (as used in SQL) and 3VL, and that this opinion is not just a whim but an opinion formed after careful consideration of available facts and theories. I won't dispute any of that. I also won't dispute that some of the consequences of how NULL and 3VL are used cause results that appear baffling or even contradictory for those who, like all people, are used to thinking in 2VL. But that in itself is insufficient to disqualify NULL and 3VL as "wrong". To wit, some of the consequences of calculating with positive and negative infinity or with i (the square root of -1) appear baffling or contradictory, and yet they are accepted ingredients of mathematics.

In the rest of my message, I won't discuss how results might appear contradictory when viewed from a 2VL perspective, or how results might be undesireable to some. I'll focus only on your attempt to prove how SQL's NULL results in unsystematic treatment of missing data - and show that your own example actually SUPPORTS rather than disproves that the treatment of missing data *IS* systematic.

>Not convinced? Tons of arguments and proofs were presented to make a
>case against them..Another proof, run this in SQL ORACLE, SQL Server or
>DB2
>
>create table table1(field1 int, field2 varchar(1))
>go

Ahem. Without a primary key, this technically isn't even a table. But unfortunately, the SQL databases I know allow this...

>insert table1
>select 1, 'A'
>insert table1
>select 2, 'B'
>insert table1
>select 3, 'C'
>insert table1
>select NULL, 'B'
>
>--Query1/ The following should return ALL records whether NULL or not
>select * from table1 where field2 = field2
>--it turns out it return 3 out of 4 records

My prediction is that it returns all 4 rows, in all major (and most, if not all, other) databases. After changing the SELECT statement to

>select * from table1 where field1 = field1

(what I assume you meant - since the fourth INSERT statements enters a NULL in the field1 column, not in the field2 column), the number of rows returned should indeed be 3.

(One moment, please)

I just copied and pasted your code and executed it on SQL Server 2005. Both my predictions were confirmed.

> --therefore we could
>conclude that the system considers that NULL <> NULL Right? (opposite
>of = should be <> right?)

No, wrong. You're thinking from a 2VL perspective here. That's not how SQL works. A proposition has *three* possible results. Since the row with the NULL is not returned, we can now safely conclude that "NULL = NULL" will not evaluate to true. That leaves us with false and unknown as remaining possibilities.

>Next run...
>select * from table1 where field2 <> field2 to get the NULL value and
>the system returns no records

Indeed. And if you change field2 to field1, it still returns no rows. And rightly so - very dependable, very predictable. Very systematic.

> which is madness because the system
>NEITHER considers NULLS as being = NOR <> from one another...

It would be madness in 2VL. SQL uses 3VL. We now know that NULL <> NULL is not true either. The 3VL versions of boolean laws let us conclude from this that NULL = NULL is not false. We already knew that it's not true either - which leaves only the third logical value, unknown.

I seem to recall that the full ANSI standard allows this query as well:

  SELECT * FROM table1 WHERE (field1 = field1) IS UNKNOWN

I can't run it myself since SQL Server doesn't implement this part of the ANSI standard - but if yoou have access to a database that does, you'll that the fourth row WILL be returned by this query.

I maintain that the way SQL uses NULL and 3VL to handle missing data results in a systematic treatment of missing data. Maybe not the treatment you, I or others would prefer - but systematic nonetheless. And that's a lot better than some other attempts... I'll gladly admit to being wrong - if anyone can show me a concrete example of SQL that denies predictability or treats similar cases dissimilar.

Best, Hugo Received on Tue Oct 10 2006 - 01:18:58 CEST

Original text of this message