Re: So what's null then if it's not nothing?

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Sat, 03 Dec 2005 00:09:53 +0100
Message-ID: <ikj1p1l00amd5932e835mbjkq9pcaaeh5e_at_4ax.com>


On Thu, 1 Dec 2005 09:23:17 +0100, Jon Heggland wrote:

>In article <ji7so1l73jsh1rkjencq25ibirrh7oe8h3_at_4ax.com>,
>hugo_at_pe_NO_rFact.in_SPAM_fo says...
>> I agree with this. Removing NULLs from the results of a DISTINCT
>> operation is weird.
>
>I don't think I have actually said that it is weird.

Hi Jon,

No, you didn't - I did. And it's not the best word to describe my position on this, but not being a native English speaker, I sometimes have to go with the "least bad" translation I can find.

But in my personal logic system, it is, well, let's say: disputable. In my logic system, lumping all "not specified" into one group when doing a GROUP BY makes sense. They are probably not the same, but they are, somehow, "related". That makes treating them as one group acceptable.

For DISTINCT, my personal logic works differently. Where GROUP BY is about forming groups, DISTINCT is about removing duplicates. A group is supposed to be about similar but not necesserily equal individuals; duplicates are supposed to be, well, duplicate. If I were asked to group the posters in this thread by age, I'd make two groups. One for me, and one labeled "age not supplied" for all others. On the other hand, if I were asked how many distinct ages the posters in this thread have, it would never occur to me to reply "two".

Well, as I said - it's my personal logic. Maybe it's me who's weird. <g>

(snip)
>> >I might want to be sure that the value is accepted only if the
>> >constraint is satisfied. Why is one more correct than the other? I know
>> >which one I think is easier to state and understand.
>>
>> In most cases, that's as easy as adding a NOT NULL constraint to the
>> column.
>
>I know that, of course. It would be even easier if I *didn't* have to
>add NOT NULL.

I disagree. I'll give an example. Consider the column Age, a NOT NULL constraint, and a constraint that if an age is entered, it has to be at least 18. There are four combinations:

1. allow everything;
2. disallow nulls;
3. disallow minors but allow nulls;
4. disallow nulls and minors.

In the current SQL implementation, the column definitions look:

1. Age integer,
2. Age integer NOT NULL,
3. Age integer CHECK (Age >= 16),
4. Age integer NOT NULL CHECK (Age >= 16),

In the version you prefer, the equivalent column definitions would be:

1. Age integer,
2. Age integer NOT NULL,
3. Age integer CHECK (Age >= 16 OR Age IS NULL),
4. Age integer CHECK (Age >= 16),

In the current version, nullability of the column is represented in one consistend way. In your proposed version, the representation of the constraint that forbids nulls takes a completely different form (in fact, it changes from explicitly visible to implied only) as soon as a CHECK constraint is added to the column. I don't consider this easier.

>> If that doesn't work in a specific case, then make sure that you
>> formulate the constraint such that it will evaluate to false for each
>> row you don't want to accept.
>
>Yes, I know. But that doesn't jive with what database textbooks teach:
>That constraints are assertions that hold true for all possible values
>of the database.

Then these textbooks are wrong. Given how CHECK constraints are defined in the ANSI standard and how they therefor are supposed to behave, the correct description is that constraints are assertions that are never false for any possible value of the database.

>> It's not a question of more or less correct, BTW. It's just a question
>> of how the rules are defined. But it might help to consider that *if* a
>> constraint would reject all rows that evaluate to false or unknown, most
>> CHECK constraints would implicitly disallow NULLs in many columns.
>
>I do of course consider that. As a feature. I also think NOT NULL should
>be the default.

Interestingly, MS SQL Server has a backward-compatibility setting that forces all columns in newly created tables to be NOT NULL, unless an explicit NULL "constraint" (not the correct term - it doesn't constrain, it removes a constraint) is added.

Most columns in a database should be NOT NULL. If you believe that the most common option should be the default, then NOT NULL should be the default. But on the other hand - how would you call the "NOT NOT NULL" clause? It's not a constraint, since it doesn't constrain. It's more a "remove default constraint option". There surely are better terms, but however you name it - it's a new beast in the SQL stable, whereas NOT NULL is just one of the constraints.

>> The
>> current definition gives the DBA the control to either add or not add an
>> explicit NOT NULL constraint.
>
>My definition wouldn't change that. It would "give the DBA the control"
>to add an explicit "OR value IS NULL" if s/he wants a constraint on a
>nullable column. The expressive power *does not change*.

Agreed.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Sat Dec 03 2005 - 00:09:53 CET

Original text of this message