Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Alexandr Savinov <savinov_at_host.com>
Date: Thu, 16 Jun 2005 16:00:30 +0200
Message-ID: <42b185ff$1_at_news.fhg.de>


Jon Heggland schrieb:
> In article <42b14ba3$1_at_news.fhg.de>, savinov@host.com says...

>>Without queries database is not able to interpret data and to maintain it.

>
>
> Can you elaborate? What interpretation is desirable, what maintenance is
> impossible?

For example, consider the use of joins. We have several types of them including manual joining by means of WHERE. Then in each individual query you need to specify all the details of joins. Again, that is needed because our database unable to derive necessary information. And it is unable to do it because it does not know the semantics of data - it simply can retrieve the specified data according to processing instructions given in the query. And the semantics is absent because there is no acceptable data model. So in very simplified form you can consider a goal getting rid of joins. Each model then will include as a necessary part all relationship and the user needs only specify *what* he want to retrieve but not *how* he has to produce the result set.

As I mentioned somewhere, a motivating example from UR model might be very appropriate for COM as well: we need to compute queries like

SELECT Emploees.name FROM Emploees WHERE Managers.name='Jones' AND Products.type'cars'

In terms of MS WinFS it would sound like "I want to get all employees with Jones as a manager and related to product with type 'cars'". Note that we use 3 tables here (2 for constraints which are propagated and 1 as a target).

In such an approach the database changes its role. Instead of maintaining rows in tables it maintains relationships between data items. It is a significant change of paradigm actually. In COM an isolated item has not its own semantics (no properties) - its semantics is distributed over the whole database via relationships. So instead of specify *how* the result set is produced we need to specify *what* part of the global semantics we want to see.

>>In simplified form you can view this problem as moving some part of 
>>all queries into the database (into the model) where they have a 
>>persistent form. After that queries are simpler but the database is more 
>>complex.

>
>
> Can you provide an example?

see above. Or other examples from UR model.

In COM we provide another solution based on other basic assumptions and other methods.

>>Again, assume you have two models each with several 
>>thousands tables. The question is if the first model is more specific 
>>than the second one. 

>
>
> Specific in what sense? In what kind of information can be represented?
> If you mix data and metadata, isn't that unbounded? Or in the sense of
> the ability to infer all the information of one "model" (I.e. database
> state) from another (with possibly different schema)?

An empty model has no information. Then we can add some information into it and make it more specific. More specific model follows from more general one in logical sense.

>>You follow RM tradition where database is a set of 
>>tables and we can manipulate these relations by producing new relations, 
>>making grouping and aggregations etc. It is a kind of programming where 
>>we need to specify a concrete way how our result set needs to be 
>>produced and then our database will simply execute this program. 

>
>
> Yes, it is logic. Powerful, safe and sound. Note that the database
> doesn't need to execute it exactly as we phrased it; it can transform
> and optimise our request without fear of getting things wrong. This is a
> very major feature of the RM; it is surprising how many overlook or
> dismiss it. (Not that I am saying you do.)

For simple examples our queries really look nice. But for complex system one query is a small program - long, complex and error prone. Just take into account infinite depates about NULL interpretations, how to use join in one or another situation. Currently it is overcomplicated, imo. If it is so, then this problem cannot be solved from inside - it is necessary to change our view of data.

>>For example, we 
>>have a data model for a corporation. It includes production departments, 
>>sales departemt, peronel etc. But it is still one organisation as such, 
>>i.e., one point possible with some properties. When I start creating 
>>this model I want to define my organisation as one element of this 
>>model. Tomorrow I come and see that this corporation has something 
>>inside and I add this something into the model as additional elements. 

>
>
> Sounds like the way most people do modelling---one thing at a time.

Here is an example that illustrates the difference. Consider a star or snowflake schema. You draw one table in the center and other tables around it. In COM it is wrong. Not because you draw it so, of course. But because you think of your data so. In COM master table is a subconcept that needs to be positined under its superconcepts. The position in the hierarchy determines its role (the results of queries will depend on it). Detail tables are superconcepts and they are positioned all above the master table. In general case all the tables have to positioned hierarchically and it is crucial importance for modelling.

-- 
alex
http://conceptoriented.com
Received on Thu Jun 16 2005 - 16:00:30 CEST

Original text of this message