Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!drn.maxwell.syr.edu!news.maxwell.syr.edu!postnews.google.com!o13g2000cwo.googlegroups.com!not-for-mail
From: "erk" <eric.kaun@gmail.com>
Newsgroups: comp.databases.theory
Subject: Re: Does Codd's view of a relational database differ from that of Date & Darwin? [M.Gittens]
Date: 6 Jun 2005 06:39:12 -0700
Organization: http://groups.google.com
Lines: 64
Message-ID: <1118065152.594957.214030@o13g2000cwo.googlegroups.com>
References: <SQhne.9308$BR4.3785@news-server.bigpond.net.au>
   <1117636456.185385.128930@g44g2000cwa.googlegroups.com>
   <3R6oe.1962$F7.1337@news-server.bigpond.net.au>
   <Hweoe.110129$zI7.6623283@phobos.telenet-ops.be>
   <dwxoe.3081$F7.1382@news-server.bigpond.net.au>
   <jMzoe.111207$SI7.6632478@phobos.telenet-ops.be>
   <DjMoe.3691$F7.1513@news-server.bigpond.net.au>
NNTP-Posting-Host: 170.201.180.137
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1118065158 13196 127.0.0.1 (6 Jun 2005 13:39:18 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 6 Jun 2005 13:39:18 +0000 (UTC)
In-Reply-To: <DjMoe.3691$F7.1513@news-server.bigpond.net.au>
User-Agent: G2/0.2
Complaints-To: groups-abuse@google.com
Injection-Info: o13g2000cwo.googlegroups.com; posting-host=170.201.180.137;
   posting-account=D5on_Q0AAACQwqv81pYDfkXI2sLkx8Xr
Xref: dp-news.maxwell.syr.edu comp.databases.theory:31243

mountain man wrote:
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>
> Thanks Paul for the above reference which I have just read.
>
> According to this reference we can replace a null in the salary
> field with "Salary not known" and/or "Unsalaried".  This has
> taken some work to do, by a database professional, to derive
> an "improved" version of the personnel table (when needed).
>
> So what?  The original design schema is simply missing information
> for these elements, and this information needs to be entered,
> and/or determined and entered.
>
> Why should a qualified database professional spend time on
> such a problem when the only real and viable solution to this
> problem is to identify the missing information and then to get
> it into the database?

The trouble isn't the missing information per se. The trouble is that
that introduces into queries. Examine SQL's use of nulls in aggregate
functions to get an idea. To briefly list some problem areas: does a
"missing" salary count as 0 in a sum? Does it affect an average? When
selecting employees with a salary below, say, $10,000, does "missing"
get included?

The point is that the answers to the above questions can (and do!) vary
from domain to domain; "missing" values in salary might be treated very
differenly from missing values in social security number (for example).

And all of the above sidesteps the problems involved with
distinguishing "types of nulls" - missing versus not applicable versus
other 4 and 5 VLs that I've never really understood.

> A simple workflow routine, channelling the appearances of
> any critical nulls (not taken care of by the constraints!) to
> the people in the organisation that are directlt responsible
> for the entry of that element of data, also fixes the problem.

And until that data is entered, what is to be done with queries over
those tuples? Are they to be completely ignored until null values are
"corrected"?

> Normalisation appears to be a theoretical sledge hammer
> trying to cover up underlying integrity issues without actually
> solving the integrity issue at its fundamental level.  At least
> this is the impression I get after reading the above reference.

I don't understand what "solving the integrity issue at its fundamental
level" means, but nulls have the same issue. At best, it's a solution
to a small range of problems, and as SQL suggests, can be difficult to
get "right," if right is even possible.

> It's the long way around a problem, and does not in fact
> ultimately solve the problem of the missing information,
> which the original schema -- by the guidance of the RM
> presumeably at implementation - should have been
> required as mandatory.

Along with domain design, it data to be specified, rather than leaving
the hard decisions and inconsistencies to be discovered later.

- Eric

