Re: So what's null then if it's not nothing?

From: vc <boston103_at_hotmail.com>
Date: 29 Nov 2005 09:49:53 -0800
Message-ID: <1133286593.699982.59590_at_g43g2000cwa.googlegroups.com>


Jon Heggland wrote:
> In article <1133277089.665394.263550_at_o13g2000cwo.googlegroups.com>,
> boston103_at_hotmail.com says...
> > Jon Heggland wrote:
> > [...]
> >
> > > A SELECT DISTINCT (it is really too kind to call this SQL
> > > construct "projection") eliminates duplicate NULLs, just like it
> > > eliminates duplicate values. To handwave this by saying "well, they're
> > > not *distinct*, but that doesn't mean they're *equal*" is just a bad
> > > excuse.
> >
> > It's simple really if you bother to read something about the 3vl.
> > According to the 3vl rules, whether two nulls are distinct or equal is
> > unknown which maps to false in the 2vl.
>
> By the exact same argument, duplicate NULLs should *not* be removed,
> because they are not equal. Choosing one over the other is an arbitrary
> decision; an artifact of how you formulate the SELECT DISTINCT
> operation.

No, the argument is not exactly the same. We have stipulated two independent rules of the game:

  1. two nulls are not distinct; b. two nulls are not equal

In the 3VL, neither implies the other. Whether the rules make *practical sense* is another matter entirely ;)

>
> And this "unknown maps to false"---where do you get that from?
>

>From the same article I've mentioned multiple times. E.g. select
produces only the tuples for which the select predicate evaluates to true. In the 2VL, it means effectively that unknown is treated as false.

> > > I suppose sorting is defined in a similar manner? "NULLs are collected
> > > at the top in an ascending sort, and at the bottom in a descending sort,
> > > but we don't actually *compare* them to anything. Oh no."
> > >
> >
> > Oh yes. Ascending or descending has got nothing to do with null
> > 'ordering'. In order to place nulls at the top/bottom of a result set,
> > one has to say 'order by nulls first/last' with 'last' being the
> > default thus creating a user mapping that converts nulls to the
> > least/greatest value.
>
> And to place 'A' before 'B', one has to say that 'order A before B'.

I do not understand what you are trying to say here. There is some natural ordering defined on various data types , that's what 'order by' uses. There is obviously no such ordering amongst nulls, therefore one has to specify ordering manually, using words like nulls last or nulls first.

>
> > > > The
> > > > SQL'92 standard says that the column value is a duplicate if it's
> > > > already present in the set of existing column non-nulls
> > >
> > > I'm not able to parse this.
> > >
> >
> > What's so hard about that ?
>
> I didn't understand the phrase "set of existing column non-nulls".
>
> > Say, you have a set {1,2,3}, then you
> > get a new value 1. Clearly, you have a duplicate. Now, let's assume
> > your new value is null. You cannot say whether it's a duplicate or not
> > -- the result of set membership operation (or comparison if you prefer)
> > is unknown. The 3vl unknown is mapped to false, therefore, null is
> > *not* a duplicate.
>
> Now insert NULL again. Still no duplicate?

We *do not know* so the rule of the game is to be optimistic and hope it is not a duplicate.

>Not according to SQL,

What do you mean ? The SQL unique constraint does not consider two nulls as duplicates.

> and it
> seems you agree that that is sensible.

>
> But now take a table with a composite key (or a set of tuples, if you
> prefer). Insert (NULL, NULL). Insert (NULL, NULL) again. No duplicate.
> Insert (1, NULL). No duplicate. Insert (1, NULL) again. Duplicate! Why?
>

Because, the implementation(s) deviates from the standard (for a lot of practical reasons tha may or may not make sense to everyone). According, to the poorly worded standard, all the columns should contain non-null values in order to produce a duplicate. Otherwise, the result is unknown and therefore the constraint is not violated.

> > > > So for NULL, the DUPLICATE predicate would return UNKNOWN (or
> > > > NULL) which is then mapped to false
> > >
> > > Indeed? AFAIK, SQL constraints are considered satisfied if they are true
> > > or if they are unknown.
> >
> > With constraints, one wants to be sure that a value is *rejected* if
> > the constraint is *NOT* satisfied.
>
> I might want to be sure that the value is accepted only if the
> constraint is satisfied. Why is one more correct than the other? I know
> which one I think is easier to state and understand.

Then, in your pessimistic database world, you'd reject all the record for people with unknown phone number, unknown age, unknown middle name, etc. Also, if you desire so, you can explicitely reject rows with nulls by augmenting your predicate with the ISNULL condition.

In short, one has at least three choices:

o not use NULL at all and be happy
o rely on the Codd 3VL
o eliminate nulls whenever seems reasonable by using the ISNULL predicate.

>
> > >Which, by the way, gives the amusing(?) result
> > > that restricting a table by its constraint is not a no-op.
> >
> > I do not understand that.
>
> Create a table with an attribute A, and a constraint that says that A
> must be greater than five. Insert ten rows successfully. Select the rows
> where A is greater than five. How many rows do you get?

I'll get as many rows as there are satisfying the greater than 5 predicate. There might be some rows with nulls, but I won't not get those (which should be evident based on our discussion).

>
> > > > -- you have to make some decision at this point.
> > >
> > > *I* would? I don't understand. Doesn't the SQL standard make a decision?
> >
> > Sure, the standard does, 'you' is an impersonal pronoun here.
>
> Ok. Now, with 2VL, the unique constraint behaviour does *not* depend on
> how you define it, and there is no decision to arbitrarily make. It is
> clear and simple, and doesn't surprise anyone. I think it is better.

Then, do not use nulls and all the real or perceived problems will go away.

>
> > > Inconsistent with "many other cases"? Not all cases, then? Either it's
> > > consistent, or it's not. SQL is not.
> >
> > You are being a little too righteous here. In some cases SQL designers
> > made consistent decisions with respect to NULL and a lot of other
> > things, in some others they did not.
>
> Being sometimes consistent and sometimes not, is not being consistent.
> That is kind of the point of consistency.

SQL is an engineering artifact, not mathematical, and as such is full of compromises. Could be better but not so bad really, especially in comparison to XML/UML/network and other similar creatures trying to pose as data models or data management systems.

>That said, I admit to being
> righteous and arrogant. Sorry. :)
> --
> Jon
Received on Tue Nov 29 2005 - 18:49:53 CET

Original text of this message