Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Alexandr Savinov <savinov_at_host.com>
Date: Fri, 17 Jun 2005 14:27:35 +0200
Message-ID: <42b2c1ba$1_at_news.fhg.de>

Jon Heggland schrieb:
> In article <42b2abb2$1_at_news.fhg.de>, savinov@host.com says...
>

>>Jon Heggland schrieb:
>>
>>>In article <42b28fe7$1_at_news.fhg.de>, savinov_at_host.com says...
>>>
>>>
>>>>Let's take three tables E, D, and M and two subtables ED and DM. If we 
>>>>impose constraints on M then we are not able to infer anything on E. The 
>>>>reason is formally clear: ED and DM are independent (orthogonal) because 
>>>>they do not have a common subtable (a formal criterion). This means that 
>>>>the canonical semantics (or primitive semantics) is represented as a sum 
>>>>of two tables ED and DM without intersection like this one:
>>>>
>>>>emp1, dept1, null, null
>>>>emp2, dept2, null, null
>>>>null, null, dept1, mng1
>>>>null, null, dept2, mng2
>>>
>>>
>>>Is this better than the corresponding notion in the RM, namely the join 
>>>(logical and) of ED and DM?
>>>
>>>emp1, dept1, mng1
>>>emp2, dept2, mng2
>>
>>I find the question "is this better" somewhat irrelevant. My goal is to 
>>remove such ambiguity in data interpretation. When we design our 
>>database we want to know clear what is the formal semantics of data. In 
>>the above example the final canonical semantics is represented by 4 
>>column table as I wrote it. Your 3 column table is not better or worse - 
>>it simply another database.

>
>
> The semantics of my relation is unambiguous and formal (given the
> semantics of ED and DM). What is the semantics of your table? What does
> it mean?

The semantics of my 4-column table is that employees and departments are related, and departments are related, but not more. In particular, ED and DM are unrelated.

Your table says that E, D and M are related. These different models. And the problem is only which of them describes our problem domain. I think that your model has more information than there is in the problem description. (But I do not argue - you may see the problem description differently, may be I miss something.)

Here is my schema:

   Top
  / | \
E D M
  \/ \/
  ED DM
   \ /
  Bottom

It is 4 dimensional (4 primitive dimensions - 4 paths from the bottom to the top).

Here is your schema:

   Top
  / | \
E D M
  \ | /
   EDM
    |
  Bottom

It is 3 dimensional.

In principle it is possible to effectively remove one dimension from my schema by making two dimensions completely dependent. But I am not sure why we might need to do it.

> If I understand you correctly, you have to introduce an additional
> construct (a common subtable) to establish the connection between
> employee and manager. In the RM, this is not needed. Simpler. Better.

I see no difference here between COM and RM. In both cases we need to have some additional table to represent a relationship. Of course, in some cases we can optimize the representation and do it without a separate table (many-to-one or one-to-many relationship). But the same is true for COM. The difference is how we model the data and how we interpret it. In both cases (as well as for most other existing models) the data is stored in tables as records. If ORM (object-role modeling) stores data in tables then it does not mean that it is based on the RM. In most cases RM is used as a low-level storage model. In is very convenient to have a complete freedom at low level and then constrain it at higher level.

>>As we found out this table can be also viewed as the universal relation.

>
>
> On the contrary (if I understand UR correctly---my understanding is just
> based on Ullman's slides, though): It is *my* relation that is the UR in
> this case.

Your representation makes an additional assumption about equality of two department variables. May it is precisely what Ullman meant...

>>>>It consists of two isolated sections so it is not a fault of the 
>>>>database that it cannot infer anything about employees given information 
>>>>about managers. We need somehow connect two department columns in the 
>>>>middle. In the RM it is not a problem because everything is done 
>>>>manually without any rules, 
>>>
>>>No, no, no. In the RM, this is done formally, with very definite rules 
>>>indeed. ED and DM are not independent---at least not in the "unrelated" 
>>>sense---they both involve departments. The database knows this (if it is 
>>>properly designed), and it can suggest it to the user, if necessary.
>>
>>Yes, it can suggest (and it is very useful) but it cannot resolve this 
>>problem because there is no formalization for this case.

>
>
> And there cannot be. Explicitly saying that one path is the "correct"
> one (and disallowing other paths) does not constitute a "formalization".
> It may be productive for some applications, but I don't see the value of
> putting this kind of limitation on the underlying data model as such.
> Put it in the user interface.

I agree, different paths express different relationships in data so we need to specify explicitly what we need and the database does not help here. For example, if we want to get a set of houses related to some person then we need to choose via what kind of relationships: house ensurance or house sales records. But in any case this mechanism has to be present in the database and its query language. I still do not want to write joins. You propose to implelent it at the user interface level but I would prefer to have more support from the database. For example, the database should know about alternative paths, which are represented in a way different from explicit joins. One approach consists in specifying an intermediate table in the path. In this case the qurey might look as follows:

get all houses related to 'Smith' via HouseEnsurance

Nice format, is not it? Would not you like to have such a facility?

>>What I meant when I wrote that in RM we can solve this problem is that 
>>we have a complete freedom in manipulating data. Do you agree that in 
>>the RM we can do whatever we want with our data by producing result sets 
>>which do not make sense? Note I do not say that it is bad or good, I 
>>just want to say that this power and freedom to use it compensates our 
>>lack of knowledge about data and its meaning.

>
>
> A relational database is a set of axioms; queries are theorems that the
> query engine proves from those axioms. Thus, any results are as
> logically correct as our axioms, and they all "make sense" in that they
> represent true facts about the reality we are modelling.
>
> It is certainly possible to derive useless facts, such as "The customer
> id of the customer named 'Jon' looks the same as the product id of the
> product named 'Monkey wrench'" (a join of hypothetical customer and
> product relvars over id attributes). But the fact is nevertheless
> *true*, and who am I (or you) to say that such a query should be
> disallowed? After all, we have already put that information into the
> database (though in a different form, and probably piecemeal); we *must*
> be able to retrieve it.

So the question is do we really need to have such a freedom which allows us (makes it easy) to produce meaningless results? Once we can evaluate what query will produce meaningfull result and what will not then why not to have this knowledge as a part of our database? I undersand you point. We have axioms and we have inference engine so do not complain if you wrote meaningless program. That is true. But what I mean is that it is better to constrain our freedom of producing meaningless results by changing the model, i.e., the axiomatizaiont and inference enigne. For example, it will include not only facts but also relationships. And these relationships will be interpreted in some special way which makes much less probable deriving meaningless results.

> With proper domain support, the database can even warn us that comparing
> customer ids and product ids is a suspect undertaking. If you want to
> restrict the end users' options, just don't give them direct access to
> the database.

Yes, it is possible. But if we can warn a user about some meaningless situation then why not to bring this knowledge into the database as a part of the model? For example, if we say that a department consists of employees then we actually make a strong statement which can be used for producing automatically meaningful queries.

-- 
http://conceptoriented.com

Received on Fri Jun 17 2005 - 14:27:35 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message