Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]
Date: Wed, 15 Jun 2005 10:59:48 +0200
Jon Heggland schrieb:
>>I define all data items as having some position along all dimensions. If >>the position has NULL value then I can write it explicitly. If some >>dimension is known in advance to have always NULL value for some set of >>items then this dimensions is said to be inapplicable, meaningless etc. >>Such dimensions are not included into the definition of this set of >>items (into a set of table columns).
> Then what are we arguing about? Are you saying that Age is not always
> NULL for products / some products have an age? In that case, the two-
> table relational model in your example does not match reality, and it is
> a straw man. Or are you talking about efficient implementation of your
> conceptual concept-oriented model?
The first point is that once a column is defined for some table in the schema then it *formally* can be considered a completely legal column for all other tables. In other words, for any row in this very database in any table we can ask what value it has in this column. In particular, for a product we can ask (formally) what is its age. Why do we need such an approach? Because it makes our life and the life of the database much easier and data manipulation much simpler and more natural. As I mentioned, we define very naturally a model dimensionality, model "more specific" relation, model consequence just because we are able to produce canonical semantics. As a consequence we can define grouping and aggregation, inference and other mechanisms.
The second point is that *informally* such an approach is more natural and better describes the reality. In particular, the terms "a column to be relevant/irrelevant", "inapplicable", "property does not makes sense" and numerous other variations gets very concrete meaning.
>>That is actually a mechanism of >>imposing syntactic constraints. So the principle of Occam's razor is >>very relevant here: in this way we indeed decrease the number of basic >>elements of the model (the primary goal of the concept-oriented model).
> I don't understand. What are you reducing/decreasing? Are you not
> including the Age dimension for products in some manner? If so, why
> still make queries about it?
We are not going to make queries about that (normally). (Actually, we want to reduce querying to minimun but that is another topic.) The problem is that we need to explain how to deal with all the columns to our database engine, i.e., we need it in the theory. But theory is needed to produce correct results of the queries.
For example, we have 1000 tables. Then we impose constraints on one or more of them. Then we want to ask what is the value of some property of some object. The tables can be related indirectly and the database must be able to correctly interpet these relationships and to uderstand the data semantics in order to answer such questions. And here we cannot avoid the necessity of having the above mechanism.
I understand your question: So what is the problem, we simply write queries and database engine executes them. Correct. But what if I do not want to write numerous queries? I want my database do it for me. I define only my data and then ask questions and that is all.
>>>>What if I want to get their position along other 998 dimensions?
> Why would you (or rather, the user) want to, anyway?
See above. Because we need to build a database that would know our data semantics for carrying out inference, aggregation, constraints propagation etc. Otherwise I need to write complex queries myself.
>>>They have none. But anyway, aren't you the one that argues that "their >>>position is NULL" and "they have no position" is equivalent? Isn't it >>>then mainly a matter of syntax/presentation? I just don't like to use >>>the NULL word; it has too many meanings. >> >>Ok, NULL is too ambiguous. I mean that we need something that will >>denote an absence of coordinate value for a data item (is not positioned >>along some dimension, is absent from that point of view, imapplicability >>of this variable and other informal interpretations).
> Why do we need that? Why not denote absence by absence? If something is
> present, everything else is absent---it saves you the trouble of
> explicitly denoting the universal compliment of you problem domain.
>>A little mit more definitions and we get the concept-oriented data model >>with which we can play. In particular, such questions as "What is power >>in kW of some house or what is area of some car" are absolutely legal.
> Legal, perhaps, but useful?
Microsoft follows (approximately) this way in its WinFS file system. Although their approach is based on Object-Role Modeling the general direction is clear - we need to relate our data items in such a way that the underying storage is able to do some tasks for us automatically. Although I do not like how Microsoft does it (there are some problems) they are the leaders (I did not see any other similar approach).
>>But when I see a model or a system I am always interested in answering >>the following questions: >> >>- How many dimensions (degrees of freedom) does it have?
> How do you determine this in your model? What inhibits this in the RM?
RM is too low level mnodel and in this sense we are able to implement almost everything. The main problem is that the database itself (the model itself) is unaware of what we are doing, what we are implementing, what our data means, what is the purpose of some query. In great extent RM can be viewed as a storage with very powerful querying and access functionality. My goal is to create a model that would know much more about data and then I do not need to explain each time what I need.
In paritucular, in RM nothing prevents you to introduce all the necessary additional features (line dimensionality, hierarchy etc.) but it simply will know nothing about that. Just like nothing prevents you from making OO programs in Pascal but Pascal will be unaware that you are writing OO program.
>>- If I have two states of this model or system then can I say that one >>of them is more general than the other (and equivalence as a particular >>case)?
> How do you define more general? As representing a superset of the
> information? This sounds trivial in the RM, though I haven't thought
> about it all that much.
I do not think it is technically difficult. The main problem is that it contradicts to the spirit of the RM, i.e., this question is not considered actual, meaningful and even legal. RM has other goals, other traditions, other methods.
>>- How can look at it at different levels of details? In other words, how >>can I produce something that is qualified as an abstract/general >>representation of this very initial model or system?
> I don't understand this one.
This is why OLAP was developed. We need different levels of details. This in turn requires other mechanisms like constraint propagation.
-- alex http://conceptoriented.comReceived on Wed Jun 15 2005 - 10:59:48 CEST