Re: What does this NULL mean?

From: mountain man <hobbit_at_southern_seaweed.com.op>
Date: Tue, 13 Dec 2005 03:45:29 GMT
Message-ID: <ttrnf.19034$ea6.17392_at_news-server.bigpond.net.au>


"Eric Junkermann" <eric_at_deptj.demon.co.uk> wrote in message news:mG8LO6Gul0mDFwjR_at_deptj.demon.co.uk...
> The number of long threads about NULLs indicates that they are a source of
> difficulty and disagreement.
>
> I have been prompted to start yet another by the www.dbdebunk.com quote of
> the week for December 10, 2005 . It says
>
> "There have been, there are, and there
> always will be NULLS in the real world."
>
> and is attributed to "Pete Brown, dbmonster.com", but it is from a
> comp.databases.theory thread "Does Codd's view of a relational database
> differ from that of Date & Darwin?" on June 4, 2005 .
>
> At the time, I followed up with a request for a definition and examples,
> but I never pursued the answer I got (which I thought less than
> satisfactory). My favourite bit out of the rest of the thread was from Ged
> Byrne, who said
>
> ... If my Manager points to a NULL in a report, I don't say 'That is a
> NULL.' I say something like 'That customer is retired, and does not
> have a work address' or 'The user left that field blank.' That is if I
> am familiar with the data. If I am unfamiliar then the NULL tells me
> nothing, I can only shrug my shoulders and say 'Sorry, I don't know.'
> ...
>
> Finally, this leads to where I want to be with this post:
>
> A NULL does not actually provide enough information to interpret it
> properly, additional knowledge is required.
>
> The best way to look at a NULL is as a sort of denormalisation. If we have
> a table X {A, B, C} where A is the key, B is a column we are not currently
> interested in, and C is the column which is NULL in at least one row, then
> we are really talking about two tables X1 {A, B} and Y {A, C}, where at
> least one row in X1 does not have a corresponding row in Y.
>
> (This is like vertical decomposition in
> http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-
> nulls.pdf)
>
> All this situation is telling us is that there is no C value in the
> database for this value of A. It does not matter how many possible reasons
> there are for the existence of this situation (the unsatisfactory answer I
> referred to above), we do not know why there is no value. Actually
> presenting lots of possible reasons was a very bad argument because only
> if there was exactly one reason would we know what was going on.
>
> There is no such thing as an acceptable implementation of NULL, because
> each occurrence of a potential NULL in a database design may have many
> possible reasons, each of which must be represented in the database
> somehow. There is no guarantee that there is a single form of
> representation for each of the reasons, and hence no general way of
> implementing a NULL.
>
> So in most cases, the meaning of NULL is "This database has no value here,
> the design must be incomplete"

Dependent upon the measure of complexity of your database it may be argued that the design is never complete because it is under constant evolution. All databases evolve, in terms of new tables new columns, etc, some far more than others, some routinely and progressively, others in chunky conversions and upgrades. The big picture must factor in evolution (or technically the term will be database change management associated with application development - or schema evolution).

Many db professionals have the luxury of thinking in terms of static solutions, and for such solutions which are not required to think outside the current specification, the treatment of nulls will necessarily be far easier. (ie: determined, specified, derived via normalisation, etc) .... but because the schema is small and/or static.

Move consideration to a vast schema'd database subject to change and the appearance and handling treatment of nulls (yes - formed by relations with incomplete new data sets) needs to be stepped up far more that the previous example.

The quote you selected ...

> "There have been, there are, and there
> always will be NULLS in the real world."

refers to considerations of the longer time spans, the change management of systems with evolving schema, the growing complexity associated with database management of change itself, new initiatives, new data collection policies, etc, etc.

The failure to adequately manage nulls in a database application environment will manifest itself as data integrity issues (in output, such as the report examples noted). It is for this reason, that routine tasks should automatically identify all potentially serious instances of NULL getting turned up in a relationship.

Such tasks are trivial to write and essentially test for the null in critical relationships within the volatile data structures and represent the lowest level database integrity meta-constraint exceptions. Once identified they can be resolved. Such a process is mandatory in a large complex rapidly evolving database, or integrity will be eventually compromised.

Theorists like Date who think the null should be rationalised out of existence have no demonstrated understanding of the database change management environment, and specifically the management of schema evolution and its consequences in regard to the generation of nulls.

People who construct database systems but do not maintain them, or have not maintained them for more than a few years, will never understand that the greatest entry point of nulls into the database is during change.

Change management or schema evolution is not adequately addressed by Date et al, however by the time they set forth the processes covered under this subject, it will become very apparent that the NULL will never be rationalised away, and it is better to therefore appropriately manage its identification, its existence and its resolution interactively.

-- 
Pete Brown
IT Managers & Engineers
Falls Creek
Australia
www.mountainman.com.au/software
Received on Tue Dec 13 2005 - 04:45:29 CET

Original text of this message