Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Fri, 17 Jun 2005 16:00:20 +0200
Message-ID: <MPG.1d1cf5e23a15e3149896a3_at_news.ntnu.no>

In article <42b2c1ba$1_at_news.fhg.de>, savinov_at_host.com says...
> The semantics of my 4-column table is that employees and departments are
> related, and departments are related,

Departments and manager are related, you mean?

> but not more. In particular, ED and DM are unrelated.

Even if the same department occurs in both ED and DM? That is denying the obvious.

(It may be construed as a philosophical question, hinging on the definition of related, though. Or on the object-oriented notion that even though two departments are indistinguishable, they are not necessarily the same. But I strongly disagree with that notion (in the context of databases, at least).)

> Your table says that E, D and M are related.

Not really. I say a priori that ED and DM are obviously and inescapably related, since they both involve departments. My table just shows what inferences may be drawn from that.

> the problem is only which of them describes our problem domain. I think
> that your model has more information than there is in the problem
> description. (But I do not argue - you may see the problem description
> differently, may be I miss something.)

If employee Jon is in department D1, and department D1 is managed by manager Sally, then it is obvious that Jon is in D1 which is managed by Sally---which is all my table is saying. It really is just a restatement of what we have already told the database---a deduction. That is all a query is (in the RM).

Now, you could perhaps argue that my table means that Jon is managed by Sally, and that this is information that is invalid---it does not "describe our problem domain", it is based on a possibly wrong assumption.

The flaw in that argument is that I do *not* say that my table means this. My table has a very specific meaning in the RM, which is perfectly valid in all circumstances (given the validity of ED and DM), *regardless* of how employees and managers may otherwise be related. And this meaning is GIVEN BY THE RM AUTOMATICALLY, through the semantics of join (logical and). The user ignores this meaning at their peril. This is, by the way, more or less what I was talking about when I first criticised Ullman's UR treatment.

> Here is my schema:
>
> Top
> / | \
> E D M
> \/ \/
> ED DM
> \ /
> Bottom
>
> It is 4 dimensional (4 primitive dimensions - 4 paths from the bottom to
> the top).
>
> Here is your schema:

No, here is my schema:

ED DM

Simpler, no? (Hence more efficient, you claimed, though I agree that is pretty irrelevant.) It contains the same information as yours, and I can make more inferences than you.

> > If I understand you correctly, you have to introduce an additional
> > construct (a common subtable) to establish the connection between
> > employee and manager. In the RM, this is not needed. Simpler. Better.
>
> I see no difference here between COM and RM. In both cases we need to
> have some additional table to represent a relationship.

No we don't! In the RM, we just join the existing tables!

> If ORM (object-role modeling)
> stores data in tables then it does not mean that it is based on the RM.

I agree that tables does not imply relational, but that is a very bad example---ORM is the conceptual modelling technique closest by far to the RM (that I am aware of). :)

> In most cases RM is used as a low-level storage model. In is very
> convenient to have a complete freedom at low level and then constrain it
> at higher level.

Yes. But why d'you have to go and make things so complicated?

> >>As we found out this table can be also viewed as the universal relation.
> >
> > On the contrary (if I understand UR correctly---my understanding is just
> > based on Ullman's slides, though): It is *my* relation that is the UR in
> > this case.
>
> Your representation makes an additional assumption about equality of two
> department variables. May it is precisely what Ullman meant...

Ullman: "[...] UR can be thought of as the natural join of all the relations, padded with nulls if necessary." That is what my representation is. Yours is a kind of outer union.

You are right that it assumes equality (speaking loosely) of the department columns, but that is what natural join does. If the department columns were named differently, or defined on different domains, the result would be different. It would still not be your table, however.

You do have a point if you claim that department D1 in ED may refer to something completely different than department D1 in DM, but that is in my opinion so far out that I soul claim *you* are the one making unlikely assumptions. But on this, we are not likely to agree.

> I agree, different paths express different relationships in data so we
> need to specify explicitly what we need and the database does not help
> here. For example, if we want to get a set of houses related to some
> person then we need to choose via what kind of relationships: house
> ensurance or house sales records. But in any case this mechanism has to
> be present in the database and its query language. I still do not want
> to write joins. You propose to implelent it at the user interface level
> but I would prefer to have more support from the database. For example,
> the database should know about alternative paths, which are represented
> in a way different from explicit joins. One approach consists in
> specifying an intermediate table in the path. In this case the qurey
> might look as follows:
>
> get all houses related to 'Smith' via HouseEnsurance
>
> Nice format, is not it? Would not you like to have such a facility?

It is pleasant, but it is nothing more than using additional information to choose between potential paths. There may still be multiple paths, too---do you want houses insured by insurance agent Smith, or insured houses owned by Smith? Or sold by Smith? Or all of the above? If the mention of HouseInsurance completely disambiguates it, it is not different at all from specifying the join, except that you do not use the word "join". It is trivial.

Anyway, a relational database does not need extensions to know about alternative paths. The "mechanism" is already present. And a query language *is* a user interface. We need a new query language (and processor) to do this, but we do not need a new data model or new relational operators, if you see the difference.

> > It is certainly possible to derive useless facts, such as "The customer
> > id of the customer named 'Jon' looks the same as the product id of the
> > product named 'Monkey wrench'" (a join of hypothetical customer and
> > product relvars over id attributes). But the fact is nevertheless
> > *true*, and who am I (or you) to say that such a query should be
> > disallowed? After all, we have already put that information into the
> > database (though in a different form, and probably piecemeal); we *must*
> > be able to retrieve it.
>
> So the question is do we really need to have such a freedom which allows
> us (makes it easy) to produce meaningless results? Once we can evaluate
> what query will produce meaningfull result and what will not then why
> not to have this knowledge as a part of our database?

I object to the terms "meaningless" and "meaningful". All results have meaning in the sense of clearly defined semantics. They may be more or less useful/useless for the purposes at hand, but that is completely different.

And what is a useless query will obviously depend on the circumstances; it cannot be determined once and for all.

> But what I mean is that it
> is better to constrain our freedom of producing meaningless results by
> changing the model, i.e., the axiomatizaiont and inference enigne.

This may be another crux of our disagreement, then. Since what is useless is a value judgement I expect to change over time (and will be different for different users!), I will not cripple my database in this manner for so little gain. I will instead hide complexity by constructing appropriate user interfaces.

> > With proper domain support, the database can even warn us that comparing
> > customer ids and product ids is a suspect undertaking. If you want to
> > restrict the end users' options, just don't give them direct access to
> > the database.
>
> Yes, it is possible. But if we can warn a user about some meaningless
> situation then why not to bring this knowledge into the database as a
> part of the model?

It *is* part of the model, that was my point! I am saying that if users really want to compare customer ids and product ids, then I should respect their wishes, not maim my model in order to make it impossible because *I* think they are wrong!

-- 
Jon

Received on Fri Jun 17 2005 - 16:00:20 CEST

This message: [ Message body ]
Next message: Jon Heggland: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]"
Previous message: Roy Hann: "Re: Newbie question"
Maybe in reply to: Alexandr Savinov: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]"
In reply to Alexandr Savinov: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]"
Next in thread: Alexandr Savinov: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]"
Reply: Alexandr Savinov: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message