Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Fri, 17 Jun 2005 13:43:53 +0200
Message-ID: <MPG.1d1cd24ae21cf8cd9896a0_at_news.ntnu.no>


In article <42b2abb2$1_at_news.fhg.de>, savinov_at_host.com says...
> Jon Heggland schrieb:
> > In article <42b28fe7$1_at_news.fhg.de>, savinov_at_host.com says...
> >
> >>Let's take three tables E, D, and M and two subtables ED and DM. If we
> >>impose constraints on M then we are not able to infer anything on E. The
> >>reason is formally clear: ED and DM are independent (orthogonal) because
> >>they do not have a common subtable (a formal criterion). This means that
> >>the canonical semantics (or primitive semantics) is represented as a sum
> >>of two tables ED and DM without intersection like this one:
> >>
> >>emp1, dept1, null, null
> >>emp2, dept2, null, null
> >>null, null, dept1, mng1
> >>null, null, dept2, mng2
> >
> >
> > Is this better than the corresponding notion in the RM, namely the join
> > (logical and) of ED and DM?
> >
> > emp1, dept1, mng1
> > emp2, dept2, mng2
>
> I find the question "is this better" somewhat irrelevant. My goal is to
> remove such ambiguity in data interpretation. When we design our
> database we want to know clear what is the formal semantics of data. In
> the above example the final canonical semantics is represented by 4
> column table as I wrote it. Your 3 column table is not better or worse -
> it simply another database.

The semantics of my relation is unambiguous and formal (given the semantics of ED and DM). What is the semantics of your table? What does it mean?

If I understand you correctly, you have to introduce an additional construct (a common subtable) to establish the connection between employee and manager. In the RM, this is not needed. Simpler. Better.

> As we found out this table can be also viewed as the universal relation.

On the contrary (if I understand UR correctly---my understanding is just based on Ullman's slides, though): It is *my* relation that is the UR in this case.

> >>It consists of two isolated sections so it is not a fault of the
> >>database that it cannot infer anything about employees given information
> >>about managers. We need somehow connect two department columns in the
> >>middle. In the RM it is not a problem because everything is done
> >>manually without any rules,
> >
> > No, no, no. In the RM, this is done formally, with very definite rules
> > indeed. ED and DM are not independent---at least not in the "unrelated"
> > sense---they both involve departments. The database knows this (if it is
> > properly designed), and it can suggest it to the user, if necessary.
>
> Yes, it can suggest (and it is very useful) but it cannot resolve this
> problem because there is no formalization for this case.

And there cannot be. Explicitly saying that one path is the "correct" one (and disallowing other paths) does not constitute a "formalization". It may be productive for some applications, but I don't see the value of putting this kind of limitation on the underlying data model as such. Put it in the user interface.

> What I meant when I wrote that in RM we can solve this problem is that
> we have a complete freedom in manipulating data. Do you agree that in
> the RM we can do whatever we want with our data by producing result sets
> which do not make sense? Note I do not say that it is bad or good, I
> just want to say that this power and freedom to use it compensates our
> lack of knowledge about data and its meaning.

A relational database is a set of axioms; queries are theorems that the query engine proves from those axioms. Thus, any results are as logically correct as our axioms, and they all "make sense" in that they represent true facts about the reality we are modelling.

It is certainly possible to derive useless facts, such as "The customer id of the customer named 'Jon' looks the same as the product id of the product named 'Monkey wrench'" (a join of hypothetical customer and product relvars over id attributes). But the fact is nevertheless *true*, and who am I (or you) to say that such a query should be disallowed? After all, we have already put that information into the database (though in a different form, and probably piecemeal); we *must* be able to retrieve it.

With proper domain support, the database can even warn us that comparing customer ids and product ids is a suspect undertaking. If you want to restrict the end users' options, just don't give them direct access to the database.

-- 
Jon
Received on Fri Jun 17 2005 - 13:43:53 CEST

Original text of this message