Re: So what's null then if it's not nothing?

From: Julian M <julian_at_dotnetsolutions.ltd.uk>
Date: 20 Nov 2005 15:59:21 -0800
Message-ID: <1132531161.702756.58850_at_g49g2000cwa.googlegroups.com>


michael_at_preece.net wrote:
> Hugo Kornelis wrote:
>
> > On 20 Nov 2005 13:58:43 -0800, michael_at_preece.net wrote:
> >
> > >
> > >Hugo Kornelis wrote:
> > (snip)
> > >> According to the ANSI specification, null has one and only one meaning:
> > >> it's a marker to represent the absence of a value.
> > >>
> > >> So it's neither unknown, nor empty.
> > >>
> > >
> > >This just seems plain wrong to me.
> >
> > Hi Mike,
> >
>
> [snip]
>
> >
> > (snipped rest of Pick description - we were discussing NULL in SQL,
> > remember?)
> >
>
> ...and how it has been the subject of much debate and the source of
> great confusion for many people. It is a bad standard.
>
> I found this in the introduction to a book on samba:
>
> http://us3.samba.org/samba/docs/man/Samba-HOWTO-Collection/pr03.html
>
> "A good standard survives because people know how to use it. People
> know how to use a standard when it is so transparent, so obvious, and
> so easy that it become invisible. And a standard becomes invisible only
> when the documentation describing how to deploy it is clear,
> unambiguous, and correct. These three elements must be present for a
> standard to be useful, allowing communication and interaction between
> two separate and distinct entities to occur without obvious effort.
> ..... Clarity and unambiguity without correctness provide a technical
> nightmare. Correctness and clarity with ambiguity create maybe bits,
> and correctness and unambiguity without clarity provide a muddle
> through scenario."
>
> There has been little or no significant debate about null in Pick
> circles. No need. It's clear. There now exists the possibility for
> confusion among Pickies now too though - owing to the erroneous
> definition of null given in the UniVerse reference manual and quoted
> elsewhere in this thread. The source of that confusion stems from an
> attempt to fit in with SQL's standard definition.
>
> Look - my final word on this. Null is and should be defined as an empty
> set. Not missing or absent or unknown or anything else. I'll accept
> that it's not going to change in SQL-land - and that confusion will
> continue to reign. I'm aware of the problem now and can live with it.
> Really - it's not my problem.
>
> Mike.

Hi,

Talking about how null works in SQL only (not Pick :-) hope this may help to clarify what happens in SQL-land

NULL in SQL can seem tricky, but it is simple enough if viewed in terms of what is intended by the ANSI SQL92/99 standards. See drafts:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt http://www.ncb.ernet.in/education/modules/dbms/SQL99/

Note: it is possible that the ANSI definitions may not be the best way to treat null, however it is the current standard and it is quite clear. I think we should follow it if possible and know when we break it (if we need to do so).

Historically Codd referred to it as "the absent value" later (I believe) he suggested that there should be two types of NULL. 1) UNKNOWN = values that "should" be there but are not known yet 2) NOT APPLICABLE = no valid value can be entered (eg: gender of a corporation)

I believe that option 2) is invalid and indicates inaccurate modelling (more on this later)

Using null is often referred to as 3 valued logic (see link from earlier post by Anne and Lynne Wheeler - article on nulls and 3-value logic http://www.garlic.com/~lynn/2003g.html#40 How to cope with missing values - NULLS?). Using "not applicable" would give us 4 valued logic. Some people have even suggested up to 5 different types of null which would give us 7 valued logic (ouch).

SQL 92 standard
r) null value (null): A special value, or mark, that is used to indicate the absence of any data value.

A null value is an implementation-dependent special value that is distinct from all non-null values of the associated data type. There is effectively only one null value and that value is a member of every SQL data type. There is no <literal> for a null value, although the keyword NULL is used in some places to indicate that a null value is desired.

SQL 99 standard (very similar)
j) null value: A special value that is used to indicate the absence of any data value.

Every data type includes a special value, called the null value, sometimes denoted by the keyword NULL. This value differs from other values in the following respects:

  • Since the null value is in every data type, the data type of the null value implied by the keyword NULL cannot be inferred; hence NULL can be used to denote the null value only in certain contexts, rather than everywhere that a literal is permitted.
  • Although the null value is neither equal to any other value nor not equal to any other value it is unknown whether or not it is equal to any given value - in some contexts, multiple null values are treated together; for example, the <group by clause> treats all null values together.

End of boring quotes 8^)

My reading of this is that SQL92/99 specifies a single null value that is defined to represent "unknown" and the "not applicable" option is excluded. This means that a null in an Integer field represents some integer value, in a character field some character(s), in a Boolean field it will be either True of False, etc. In all of these cases null is a member of the allowed set of values for the domain of the data type (we just don't know which member hence "unknown member" or just "unknown").

It is possible to interpret the ANSI standard as defining a single null value that (partially) combines both "unknown" and "not applicable." (eg: null is generalised across all data types). This would mean that null is a member of the domain of the combined domains for all data types. The result would be that a null in an *integer* field could be an integer, a Boolean, a string etc. Also a null in a char, Boolean, float etc can be of any data type. This does not make sense and (I understand that) database implementers take the first view.

There is a further possible meaning of "not applicable" where null is used when no valid value can be entered (eg: gender of a corporation). This meaning is specifically excluded by the ANSI spec, as null must be a member of the domain of the data type. With the gender example the domain is gender{male, female, null}, the gender of a corporation is not male or female so it cannot be null in the domain of gender. The solution to this is not to model Corporation with a Gender attribute as it is inaccurate.

As an aside:
In set theory you can see gender as two sets AllMaleThings and AllFemaleThings, the gender attribute is used to represent which set a thing belongs to. Obviously corporation belongs to neither therefore it should not have a gender attribute. If you also apply the strong reference principle, everything in a model must represent physical thing(s) in the world. You will see that null in the gender set actually represents any (physical thing that is a) member of the set AllThingsWithGender (which is the union of AllMaleThings and AllFemaleThings).

Note: Classes and strong reference are fundamental to the logical paradigm that evolved during the 19th century, and then evolved into the OO paradigm (see Business Objects by Chris Partridge)

IMHO NULL in SQL (as defined by ANSI SQL 92/99) is intended to mean "unknown member" of a set or just UNKNOWN.

It is of course possible to overload null to mean a lot of other meanings, I would suggest that this is not a good idea. We should stick to the standards and use different ways to represent other things, eg: - "meaningful value not even possible" (corporation gender), this is can be represented by changing the model - "value unknown at present time" (order shipping date still in future) null is fine as the value will be a date, you could add an expected ship date. This is sometimes called an optional to mandatory relationship is data modelling.
- "hidden value" this could be a person refusing to give date of birth, null is fine as the birth date must be a date. The fact that that it is not disclosed can be indicated in another way, maybe by a flag
- We can invent any number of other meanings that null can hide and all of them can be modelled in other ways.

Last second thought - you can always remove nulls by subtyping and/or simply putting optional fields in other tables. Can be useful but (from a modellers perspective) it may lead to rather artificial structures that do not reflect business entities clearly. Unfortunately nulls are a necessary evil. Maybe we need to invent a proper object oriented database standard...

Julian 8^) Received on Mon Nov 21 2005 - 00:59:21 CET

Original text of this message