Re: How to cope with missing values - NULLS?
Date: Wed, 23 Apr 2003 13:41:17 +0100
Message-ID: <b86221$h26$1_at_gazette.almaden.ibm.com>
"Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
news:61c84197.0304221440.32113f01_at_posting.google.com...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
news:<b83u65$1imu$1_at_gazette.almaden.ibm.com>...
> > "Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
> > news:61c84197.0304220818.3050fbdf_at_posting.google.com...
> > [ snip ]
> > > There is not much of a difference the way I look at it and it really
> > > bugs down to how you feel the "=" operator should work.
> >
> > Indeed it does really boil down to how equality works..
> >
> > Quite simply I take it as a axiom that
> > x = x
> > for all possiable values x
> >
> > if you want to break this by introducing something that is not a value,
then
> > be my guest, but don't expect many others to follow you.
> > Should I take this sentence as meaning that if only SQL defined NULL = > NULL as true everywhere (i nearly wrote "in a consistent manner", but > this must be superfluous) then the NULL-type would not be so bad - at > least in special places such as the outer join (I cant think of any > other place where nulls do occur naturally)?
Yes. NULL would not be so bad - it would be at least respectable - if it was a value rather than a 'marker'.
That said, I would not encourage the use of such a value across the board in a new, non-SQL DBMS. It is not a value with any well defined meaning IMO.
with
T1 {col1 INTEGER, col2 STRING}
T2 {col1 INTEGER, col2 STRING, col3 BOOLEAN, col4 INTEGER}
then
T1 LEFT JOIN T2 FILL (col3 NOTAPPLICABLE, col4 0)
is shorthand (a 'macro') for: (apologies for any non exact Tutorial D syntax - I need to brush up)
T1 JOIN
( T2{col1, col2, CAST_AS_BOOLEAN_NA(col3), col4} UNION ( EXTEND ( T2{col1, col2} MINUS T1 ) ADD NOTAPPLICABLE as col3, 0 as col4) )
In SQL the shorthand is incomplete because there is no 'fill' specification, it uses NULLs for the 'fill'.
I think your argument boils down to whether a outer join shorthand should be
allowed without a 'fill' specification. I would not include such a shorthand
in a DBMS, but would not disallow a user from defining their own shorthand if
they so wished. Such a user defined shorthand might go as far as using
information in the catalog recorded against each type that defines whether the
default 'fill' should be a value from that type (e.g. 0 for integer), or from
a value in the union of the type with some other type. That however starts to
get us into the realm of 'meta-programming', whereby in a relational language,
the result relation type of a query depends of information not part of the
database schema per-se.
The problem with any default fill specification, is that it is making
assumptions about the meaning of a users LEFT JOIN statement. Sometimes 0 is
correct for a fill, sometimes N/A, NOTYETOCCURED, UNK, BLANK, -1, etc. It all
depends on the *actual* query that the user wants, a LOJ with a default fill
specification is making *a guess* at what query the user actually wants, and
unless the user sees the macro expanded (or has memorised the defaults), they
won't know what query they are getting the result to.
To summarise, using left outer joins without explicitly specifying the fill, (using say COALSECE in SQL), is bad practice. Users should be forced to specify a fill to get them to fully understand (i.e. fully specify) their query.
If I review SQL that has un-COALESCEd LOJ columns, I send it back to be corrected.
Regards
Paul Vernon
Business Intelligence, IBM Global Services
Received on Wed Apr 23 2003 - 14:41:17 CEST