Re: How to cope with missing values - NULLS?

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Fri, 25 Apr 2003 10:57:01 +0100
Message-ID: <b8b0t4$ru0$1_at_gazette.almaden.ibm.com>


"Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message news:3ea81614$0$42636$edfadb0f_at_dread11.news.tele.dk...
> Yes - I am aware that an outer join corresponds to a UNION of a join and a
> difference operation. Still, I do not see the importance of that distinction
> as what constitutes a primitive operation can be debated.

Indeed, but still, I don't know anyone who would make OUTER JOIN a primative.

>[snip] it allows the DBMS
> a nice way to support NULLs as we know them from SQL.

> It might be incomplete but there is some convenience involved, I think. One
> advantage is that there is no need to create your own "NA"-types for every
> base-type you create (BOOLEAN_NA is such a type, I presume?).
[snip]
> This ensures a
> consistent result no matter who creates the view.

But the point is that the DBMS does not know the intended meaning of a outer join view/query. My position is that assuming a NA, NULL or whatever default fill does no-one any favours. It does not force users to think what the meaning of their query is. To me, your 'convenience is my 'dangerous convenience.

> Also note that the if above join was used as some part of the query, we
> could formulate a query such as (reverting to SQL, with the LOJ being named
> V1):
>
> SELECT * from V1 WHERE col3 = FALSE OR col3 = TRUE;
>
> would not select all tuples. This is as it should be as col3 is not a
> BOOLEAN but a BOOLEAN_NA which has a third value (presumably) beyond TRUE
> and FALSE. I do not see how that could come as a surprise to the user.

Agreed

> (I'm averting the alternative phrase WHERE NOT col3 or col3).

Which would, of course, not be a valid expression as col3 is not a BOOLEAN type. We would need something like

WHERE CASE WHEN IS_BOOLEAN(col3) THEN NOT col3 or col3 ELSE FALSE END

or just

WHERE IS_BOOLEAN(col3)

>> To summarise, using left outer joins without explicitly specifying the
fill,
>> (using say COALSECE in SQL), is bad practice. Users should be forced to
>> specify a fill to get them to fully understand (i.e. fully specify) their
>> query.
> I have to disagree. I can not see why a non-filled outer join should give
> any surprises to the user.

I did not talk about surprises, just about bad practice.

It looks to me that we are arguing about a matter of convenience (which is not a good matter for clear debate). Your convenience is two fold. A) Users don't have to bother with thinking about a fill for their outer joins B) Some universal standard value that stands for 'UNMATCHED_OUTER_JOIN_VALUE' is commonly used across separate databases.

I do not believe that there exists any natural 'UNMATCHED_OUTER_JOIN_VALUE' value. I do not think such a thing corresponds to any single concept in the outside world. Rather there are many meaningful values that do correspond to unmatched rows in a outer join, but that it is only on a query by query basis that such values can be decided. Conveniences such a some universal default only hinder database design by making it too easy to produce a less meaningful database.

> To argue based on the SQL way of handling nulls
> is not reasonable - their way of handling nulls is not consistent in the
> first place.

Agreed. I hope I managed to not argue in such a way.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Fri Apr 25 2003 - 11:57:01 CEST

Original text of this message