Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Alexandr Savinov <>
Date: Fri, 17 Jun 2005 17:46:27 +0200
Message-ID: <42b2f055$>

Jon Heggland schrieb:
> In article <42b2c1ba$>, says...

>>The semantics of my 4-column table is that employees and departments are 
>>related, and departments are related, 

> Departments and manager are related, you mean?


>>but not more. In particular, ED and DM are unrelated.

> Even if the same department occurs in both ED and DM? That is denying
> the obvious.

Obvious is what we have in our data model. Our data model has some dimensionality. If we can vary dimensions independently then they are independent even if we know that 'in life' they are somehow related.

> (It may be construed as a philosophical question, hinging on the
> definition of related, though. Or on the object-oriented notion that
> even though two departments are indistinguishable, they are not
> necessarily the same. But I strongly disagree with that notion (in the
> context of databases, at least).)

>>Your table says that E, D and M are related.

> Not really. I say a priori that ED and DM are obviously and inescapably
> related, since they both involve departments. My table just shows what
> inferences may be drawn from that.

So very interesting issue is:

How should we interpret the case where some tables have a common supertable?

You say that the tables "are obviously and inescapably related, since they both involve" one common supertable. I am not so sure. I agree that they are related but how we need to express this in database and how should the database use this information. Note that formally they are unrelated.

>>the problem is only which of them describes our problem domain. I think 
>>that your model has more information than there is in the problem 
>>description. (But I do not argue - you may see the problem description 
>>differently, may be I miss something.)

> If employee Jon is in department D1, and department D1 is managed by
> manager Sally, then it is obvious that Jon is in D1 which is managed by
> Sally---which is all my table is saying. It really is just a restatement
> of what we have already told the database---a deduction. That is all a
> query is (in the RM).
> Now, you could perhaps argue that my table means that Jon is managed by
> Sally, and that this is information that is invalid---it does not
> "describe our problem domain", it is based on a possibly wrong
> assumption.
> The flaw in that argument is that I do *not* say that my table means
> this. My table has a very specific meaning in the RM, which is perfectly
> valid in all circumstances (given the validity of ED and DM),
> *regardless* of how employees and managers may otherwise be related. And
> this meaning is GIVEN BY THE RM AUTOMATICALLY, through the semantics of
> join (logical and). The user ignores this meaning at their peril. This
> is, by the way, more or less what I was talking about when I first
> criticised Ullman's UR treatment.
>>Here is my schema:
>>   Top
>>  / | \
>>E  D  M
>>  \/ \/
>>  ED DM
>>   \ /
>>  Bottom
>>It is 4 dimensional (4 primitive dimensions - 4 paths from the bottom to 
>>the top).
>>Here is your schema:

> No, here is my schema:

But where in your schema Employess, Departments and Managers? If you add them then you will get 5 tables - precisely what I have.

> Simpler, no? (Hence more efficient, you claimed, though I agree that is
> pretty irrelevant.) It contains the same information as yours, and I can
> make more inferences than you.

There are no differences between the schemas. We talked about their interpretations. You said that the problem domain is 3-dimensional. I claim that it is 4-dimensional (but I can reduce it to your 3-dimensional model using an additional assumption).

>>>If I understand you correctly, you have to introduce an additional 
>>>construct (a common subtable) to establish the connection between 
>>>employee and manager. In the RM, this is not needed. Simpler. Better.

In RM it is also needed. But it has different meaning. There is no difference in the number of tables in the two models. Note that those common subtables or other intermediate tables are not necessarily implemented as one separate table. They are used for modelling purposes. In many cases we can optimaze the storage.

Again, good relational model will be equivalent to concept-oriented model on the number of tables and some other issues. COM simply makes it easier to create good models.

>>I see no difference here between COM and RM. In both cases we need to 
>>have some additional table to represent a relationship.

> No we don't! In the RM, we just join the existing tables!

When joining we do not produce new information - we simply derive and transform what we already have. I meant that in both COM and RM (actually in most other approaches) we need to represent our relationahips as rows in tables. Join is a means of transformation rather than representation.

>>If ORM (object-role modeling) 
>>stores data in tables then it does not mean that it is based on the RM. 

> I agree that tables does not imply relational, but that is a very bad
> example---ORM is the conceptual modelling technique closest by far to
> the RM (that I am aware of). :)

Ok, let's call COM the conceptual modeling approach.

>>In most cases RM is used as a low-level storage model. In is very 
>>convenient to have a complete freedom at low level and then constrain it 
>>at higher level.

> Yes. But why d'you have to go and make things so complicated?

I find COM much simpler than what I have seen before.

>>>>As we found out this table can be also viewed as the universal relation.
>>>On the contrary (if I understand UR correctly---my understanding is just 
>>>based on Ullman's slides, though): It is *my* relation that is the UR in 
>>>this case.
>>Your representation makes an additional assumption about equality of two 
>>department variables. May it is precisely what Ullman meant...

> Ullman: "[...] UR can be thought of as the natural join of all the
> relations, padded with nulls if necessary." That is what my
> representation is. Yours is a kind of outer union.
> You are right that it assumes equality (speaking loosely) of the
> department columns, but that is what natural join does. If the
> department columns were named differently, or defined on different
> domains, the result would be different. It would still not be your
> table, however.

I can design a model with only one common supertable, which is then used (referenced) from many other subtables (more than one time from each one table). For example,

   / / \ \
Manager Married

Manager and Married are two-dimensional relations. For me it is 4-dimensional model, which can be represented as one 4-columns table (Employees,Employees,Employees,Employees). But you insist that it should have only one column (Employees) (or may be 2-columns) because Employees is one common supertable. Or I do not understand you correctly.

(For simplicitly in this example I assume no other knowledge/constraints on management or family ties.)

> You do have a point if you claim that department D1 in ED may refer to
> something completely different than department D1 in DM, but that is in
> my opinion so far out that I soul claim *you* are the one making
> unlikely assumptions. But on this, we are not likely to agree.

See the above example. I can agree with you if you show that interpreting it as 1-dimensional or 2-dimensional does make sense.

>>I agree, different paths express different relationships in data so we 
>>need to specify explicitly what we need and the database does not help 
>>here. For example, if we want to get a set of houses related to some 
>>person then we need to choose via what kind of relationships: house 
>>ensurance or house sales records. But in any case this mechanism has to 
>>be present in the database and its query language. I still do not want 
>>to write joins. You propose to implelent it at the user interface level 
>>but I would prefer to have more support from the database. For example, 
>>the database should know about alternative paths, which are represented 
>>in a way different from explicit joins. One approach consists in 
>>specifying an intermediate table in the path. In this case the qurey 
>>might look as follows:
>>get all houses related to 'Smith' via HouseEnsurance
>>Nice format, is not it? Would not you like to have such a facility?

> It is pleasant, but it is nothing more than using additional information
> to choose between potential paths. There may still be multiple paths,
> too---do you want houses insured by insurance agent Smith, or insured
> houses owned by Smith? Or sold by Smith? Or all of the above? If the
> mention of HouseInsurance completely disambiguates it, it is not
> different at all from specifying the join, except that you do not use
> the word "join". It is trivial.

I have no doubt that an expert can implement these ideas by using conventional software. But using your logic we do not the relational model at all. Just use file system. Anything can be implemented without the relational model. In many cases more reliable and much more efficient.

> Anyway, a relational database does not need extensions to know about
> alternative paths. The "mechanism" is already present. And a query
> language *is* a user interface. We need a new query language (and
> processor) to do this, but we do not need a new data model or new
> relational operators, if you see the difference.

It does not depend on whether you implement this mechanism in database or user interface. You follow some idea, some view of your data. You think of your data differently. And that is enough. If you like you can retain your favorite relational database and use it as a storage (good solution). You can also move some part of you logic into it. Or you can implement a full featured database (like MS WinFS).

Received on Fri Jun 17 2005 - 17:46:27 CEST

Original text of this message