Re: The Practical Benefits of the Relational Model

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Mon, 21 Oct 2002 12:25:47 +0100
Message-ID: <ap0o97$ula$1_at_sp15at20.hursley.ibm.com>


"Leandro Guimarăes Faria Corsetti Dutra" <lgcdutra_at_terra.com.br> wrote in message news:aop8p1$ojl84$1_at_ID-148886.news.dfncis.de...
> Paul Vernon wrote:
> >
> > In particular if we assume that relvar names are simply short
> > cuts for the set of attribute names in a relvar, and therefore relvar
names
> > are strictly optional, then it should *not be possible* to create two
> > relvars in a database with the same set of attribute names. (BTW I've
> > assumed that: attribute_name -> attribute_data_type).
>
> Actually that is a big "if". Why "relvar names are simply shortcuts of
> the set of attribute names"?

I'm being tautological. If we ban two (base) relvars from having the same set of attribute names, then we can consider relvar names as simply shortcuts. But I'm only suggesting this as I know of no fundamental role for relvar names in the relational model. We use attribute names in natural joins and more importantly they are the database's version of the base relvar predicates, but what role do relvar names have (except, sometimes, convenience)?

> I assume you are thinking base relvars, because this sure wouldn't make
> any sense about derived relvars.

Humm, yes I am. The meaning of a derived relvar is exactly the relational expression that defines it. Any attribute names in a derived relvar are just shortcuts for (parts of) relational expressions.

In the running example, I would expect a view defined as say, "all LOVER and LOVEE pairs that share the same first Inital", to have the same attribute names
as the LOVES base relvar.
the predicate for such a view begin something like [ <LOVER, LOVEE> WHERE SUBSTR(LOVER,1,1) = SUBSTR(LOVEE,1,1) ]

So if I am trying to INSERT the following into the database,

    <LOVER: Romeo, LOVEE: Juilet>
then it would go into the base relvar LOVES due to the matching predicate (including matching attribute names)

conversely, if I tried to 'INSERT' the following *tuple and predicate*

     <LOVER: Romeo, LOVEE: Juilet> WHERE SUBSTR(LOVER,1,1) = SUBSTR(LOVEE,1,1)

then (assuming there is only one possible view update propagation) the tuple would get inserted via the view above (that I did not get round to naming).

> Finally, *usually* an attribute name will be its domain name, but from
> Codd's first paper it is clear that a relvar header can have the same
> domain twice, and then their names must be different.

Agreed, although domain names don't always make for great attribute names.

I'll admit that I've never built a major database explicitly following the rule that an attribute name implies it's domain. But I've not been able to think of a situation where I would want to break such a rule, so I would like to suggest that such a constraint should be mandated on any true RDBMS system catalog.

> > In other words I say that it *is* the names that have meaning (contrary to
> > what D&D say in TTM).
>
> Meaning to the user, sure. But surely you don't want to name your
> relvars, attributes, domains and constraints with long-winded, full
> explanations. So we settle for at most two or three words, and perhaps
> some of them abbreviated. Therefore, we need external documentation or
> internal comments on names to know the real meaning.

Well I agree that language, and the English language being an allegally particularly difficult example, does not respond well to formal structuring. But noetheless, my point (or at least hope) is that, simple(ish) attribute names are enough for both users and the database to have the same understanding of
predicates.

> Now, what Date & McGoveran are worried about in their article is
> meaning that can be structurally captured thru relvars and their
> domains, and thru associated integrity constraints. As they rightly
> point, this is just an approximation for the system's use.

I suggest that it is more than an approxiamtion: i.e. the conjunction of attribute names is (just about) as good as the (constraint free) relvar predicate.

> > Why appeal to some external meaning that we don't
> > even try to capture in the database - thereby breaking the Information
> > Principle? I'd agree that external predicates read better than a
> > conjunction of attribute names, but they should not actually add any extra
> > 'meaning'.
>
> I don't understand. What do you propose to substitute for
> documentation on predicates? What does that have to do with the
> Information Principle?

Ad-hoc documentaion on predicates is no good if the DMBS cannot reason using it. My proposal is that attribute names are good enougth documentation of predicates.

> How could you document a predicate in the
> database, except for the catalog and comments on it?

As I hope I've shown, simply by attribute names (although I will repeat, I know that a concatantion of relvar attribute names in not as readable (or, I agree, quite as explicitly meaningful) as a natural language predicate).

> > In the article, Chris & David suggest that they are trying to insert
tuples
> > that have no attibute names into the database. This is wrong. What they
> > should be inserting are tuples such as:
> > <LOVER: Romeo, LOVEE: Juilet>
> >
> > then with sensible relvar attribute names, e.g. (LOVER & LOVEE) and
(HATER&
> > HATEE), there is not need to talk about external meanings that are not
known
> > to the DBMS.
>
> Remember the article is old, therefore they used SQL notation.

I do appreciate that, but Chris did say that only "detail level corrections might be needed". So I feel it is ok to comment against the wider points of the article.

>But I agree that sensible relvar attribute names are a big part of the
> solution for the meaning problem, *for the user*. But the focus of the
> article is on the meaning *for the system*.

And I'm trying to unifiy the two meanings - which would be nice.

> BTW, the system knows nothing about the user-intended meaning of LOVER,
> LOVEE, HATER, HATEE. Probably the article should be revised to make all
> that clearer.

See previous response.

> > Now it might be a bit of a pain to have a RDBMS that did not allow two
> > tables to have the same attribute names (and types) in the same
*database*,
>
> Remember that the database concept in TTM is much more strict.
> Basically, everything that can be seen by the system must be regarded as
> a database, regardless of complications as namespaces or distribution.

Regarded as; _a_ database, or _one_ database, or _the_ database?

I think that the idea of what exactly consists one database is a outstanding issue.
I like the concept of 'expressible databases' (which Chris mentions in
passing in Intro to DB systems), and in fact one of my motivations here is to explore this concept more fully.

> > but frankly I could live with such a restriction if it enforced the
> > Orthogonal Design Principle (and if any local relvars we not seen as part
of
> > the main database)
>
> What to you mean as "local relvars"?

Relvars not part of the persistantly stored set of data. TTM calls them "application relvars". See page 68.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Mon Oct 21 2002 - 13:25:47 CEST

Original text of this message