Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Thu, 16 Jun 2005 14:48:07 +0200
Message-ID: <MPG.1d1b9365779b652d989698_at_news.ntnu.no>


In article <42b14ba3$1_at_news.fhg.de>, savinov_at_host.com says...
> Yes, definitely. That was clear from the very beginning. First of all it
> ia forum. The second point is that you have a defensive position which
> is unbreakable. Something like "why do I need OOP if I can implement
> everything in a procedural language". One can understand things only if
> he wants to, i.e., the position is not defensive.

That may be necessary, but not sufficient---it must also be explained clearly. I would also argue that having a defensive position and being unwilling to understand is not the same.

> > How can it be otherwise? The computer is not a mind-reader.
>
> Yes, we need to add more information into our model so that the database
> knows what to do if queries do not have enough information. In other
> words, the model has more information while queries are simpler.

I think I see what you are getting at here, but an example would be useful.

> >>>>A little mit more definitions and we get the concept-oriented data model
> >>>>with which we can play. In particular, such questions as "What is power
> >>>>in kW of some house or what is area of some car" are absolutely legal.
> >>>
> >>>Legal, perhaps, but useful?
>
> It depends. If you love the RM or if it is your religion then nothing
> else will be accepted as more useful. If you need to earn money then the
> new model will be more useful.

Because you earn money by asking about the area of cars? Why don't you substantiate your assertions?

> >>Microsoft follows (approximately) this way in its WinFS file system.
> >>Although their approach is based on Object-Role Modeling the general
> >>direction is clear - we need to relate our data items in such a way that
> >>the underying storage is able to do some tasks for us automatically.
> >
> > What tasks? What need is served? Again, how is it useful?
>
> You might want to read more about MS WinFS. They explain the usefulness
> at the user level (for it is difficult). Surprisingly, but here
> Microsoft is a leader.

Googling "WinFS data model" produced
http://www.c-sharpcorner.com/Longhorn/WinFS/WinFSDataModel.asp, where WinFS is described as a file system with metadata implemented as a relational database. Which is nice. (Isn't that what BeOS did ten years ago, though?) I'm still not sure which tasks you are referring to that become automatic, though.

> What I claim is that database should know much more about our data and
> its relationships in order to be able to perform useful tasks for us.

Ok, and what I am asking you is what it should know, and how this helps. And the pros and cons of inventing a new data model for this---unless your model is not in the same class of data model as the RM, which would make such comparisons irrelevant.

> Without queries database is not able to interpret data and to maintain it.

Can you elaborate? What interpretation is desirable, what maintenance is impossible?

> In simplified form you can view this problem as moving some part of
> all queries into the database (into the model) where they have a
> persistent form. After that queries are simpler but the database is more
> complex.

Can you provide an example?

> >>>>- If I have two states of this model or system then can I say that one
> >>>>of them is more general than the other (and equivalence as a particular
> >>>>case)?
> >>>
> >>>How do you define more general? As representing a superset of the
> >>>information? This sounds trivial in the RM, though I haven't thought
> >>>about it all that much.
> >>
> >>I do not think it is technically difficult. The main problem is that it
> >>contradicts to the spirit of the RM, i.e., this question is not
> >>considered actual, meaningful and even legal.
> >
> > How so? In its simplest form, the question is just "is this relation a
> > subset of that?"---unless I mistake your meaning. Please explain.
>
> We are not talking about relations. We are talking about the whole model
> and its semantics.

I am talking about how two "states of this model or system"---iow database states---can be compared in the RM; relations are of course involved. Anyway, this is impossible to talk about unless you define what you mean by "more general".

> Again, assume you have two models each with several
> thousands tables. The question is if the first model is more specific
> than the second one.

Specific in what sense? In what kind of information can be represented? If you mix data and metadata, isn't that unbounded? Or in the sense of the ability to infer all the information of one "model" (I.e. database state) from another (with possibly different schema)?

> You follow RM tradition where database is a set of
> tables and we can manipulate these relations by producing new relations,
> making grouping and aggregations etc. It is a kind of programming where
> we need to specify a concrete way how our result set needs to be
> produced and then our database will simply execute this program.

Yes, it is logic. Powerful, safe and sound. Note that the database doesn't need to execute it exactly as we phrased it; it can transform and optimise our request without fear of getting things wrong. This is a very major feature of the RM; it is surprising how many overlook or dismiss it. (Not that I am saying you do.)

> But you
> can look at it differently. What if I do not want to write (complex)
> queries. I want to explain everything about my data in the very
> beginning and this information is a part of the model and is maintained
> by the database.

Then you design your database accordingly. In my experience, it is useful for flexibility, extensibility and customisability, but integrity (and efficiency) suffers. It tends to lead to record-at-a-time processing too. YMMV.

For what it's worth, I don't deny that your ideas have merit, just like RDF and semantic networks, which they resemble. But not as a replacement for the RM. I may be unwilling to see the RM's flaws, but you seem to see far more than is warranted---that was my primary motivation for getting into this thread in the first place.

> There is no surprise that you do not understand it - it is
> not your fault. There are traditions, there are prejudices, there is a
> rigid coordinate system of the contemporary knowledge.

Aaah, I see.

> For example, we
> have a data model for a corporation. It includes production departments,
> sales departemt, peronel etc. But it is still one organisation as such,
> i.e., one point possible with some properties. When I start creating
> this model I want to define my organisation as one element of this
> model. Tomorrow I come and see that this corporation has something
> inside and I add this something into the model as additional elements.

Sounds like the way most people do modelling---one thing at a time.

> Eventually I will add some very specific elements at the level. But in
> any case I am able to look at it at different levels and I am able to
> query it at different levels.

Do you mean you can query the properties of the corporation, as well as the properties of its employees, or even the subcomponents of the parts of the products it manufactures? Sorry, but I don't see the novelty here. (I don't think that is what you mean, but based on what you wrote, I find it hard even to make educated guesses.)

> >>This is why OLAP was developed. We need different levels of details.
> >>This in turn requires other mechanisms like constraint propagation.
> >
> > We need aggregate functions, you mean?
>
> In order to aggregate and even store data we do not need a database - we
> can do it in Pascal.

Which begs the question, what is a database? But let's not digress.

> The problem is that we need more productive and
> more natural model.

Or better user interfaces. Sometimes it's hard to tell the difference.

-- 
Jon
Received on Thu Jun 16 2005 - 14:48:07 CEST

Original text of this message