Re: Does Codd's view of a relational database differ from that of Date & Darwin? [M.Gittens]

From: erk <eric.kaun_at_gmail.com>
Date: 6 Jun 2005 06:39:12 -0700
Message-ID: <1118065152.594957.214030_at_o13g2000cwo.googlegroups.com>


mountain man wrote:
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>
> Thanks Paul for the above reference which I have just read.
>
> According to this reference we can replace a null in the salary
> field with "Salary not known" and/or "Unsalaried". This has
> taken some work to do, by a database professional, to derive
> an "improved" version of the personnel table (when needed).
>
> So what? The original design schema is simply missing information
> for these elements, and this information needs to be entered,
> and/or determined and entered.
>
> Why should a qualified database professional spend time on
> such a problem when the only real and viable solution to this
> problem is to identify the missing information and then to get
> it into the database?

The trouble isn't the missing information per se. The trouble is that that introduces into queries. Examine SQL's use of nulls in aggregate functions to get an idea. To briefly list some problem areas: does a "missing" salary count as 0 in a sum? Does it affect an average? When selecting employees with a salary below, say, $10,000, does "missing" get included?

The point is that the answers to the above questions can (and do!) vary from domain to domain; "missing" values in salary might be treated very differenly from missing values in social security number (for example).

And all of the above sidesteps the problems involved with distinguishing "types of nulls" - missing versus not applicable versus other 4 and 5 VLs that I've never really understood.

> A simple workflow routine, channelling the appearances of
> any critical nulls (not taken care of by the constraints!) to
> the people in the organisation that are directlt responsible
> for the entry of that element of data, also fixes the problem.

And until that data is entered, what is to be done with queries over those tuples? Are they to be completely ignored until null values are "corrected"?

> Normalisation appears to be a theoretical sledge hammer
> trying to cover up underlying integrity issues without actually
> solving the integrity issue at its fundamental level. At least
> this is the impression I get after reading the above reference.

I don't understand what "solving the integrity issue at its fundamental level" means, but nulls have the same issue. At best, it's a solution to a small range of problems, and as SQL suggests, can be difficult to get "right," if right is even possible.

> It's the long way around a problem, and does not in fact
> ultimately solve the problem of the missing information,
> which the original schema -- by the guidance of the RM
> presumeably at implementation - should have been
> required as mandatory.

Along with domain design, it data to be specified, rather than leaving the hard decisions and inconsistencies to be discovered later.

  • Eric
Received on Mon Jun 06 2005 - 15:39:12 CEST

Original text of this message