Re: References & Restatement

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Wed, 4 Feb 2015 00:45:25 -0500
Message-Id: <20150204004525.a2568d5a.jklowden_at_speakeasy.net>


On Tue, 3 Feb 2015 06:42:11 -0800 (PST)
Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

> Given the content of previous discussions, a short restatement of my
> position re the HM and its relevance to the RM is in order. I am
> saying the HM lives in the RM:
> - as a concept
> - a previously well-known model, method, of organising data
> - that is explicitly referenced, and used by Codd in the RM,
> throughout the paper
> - if one implements data according to the _Relational Normal Form_
> given, for which explicit steps are given, this leads to Relational
> Keys, which are compound keys ("non-simple")
> --- such keys, in the context of the series of tables in which each
> and all of those components ("simple" as well as all subsets of the
> "non-simple") are used, will form an hierarchy of keys
> - the set of such tables form an hierarchy of tables
> - such keys may well be called Hierarchical Keys (they are well-known
> as Relational Keys, I am not suggesting that we change it)

I hardly know where to begin, Derek. It seems we've been talking past each other to some degree, because when we agree on what words mean, we seem, by and large, to agree on the concepts we each espouse. And I want to thank you, because in preparing this reply I gained a new appreciation for the meaning of "data independence".

You raise too many points for me to answer one by one. Let me call out some I think are important. If I mistake your meaning at any juncture, please correct me.

> Note that in the RM, more than half of Codd's references are to
> products and product manuals. There were not too many theoretical
> papers in the field.

You seem to think one implies the other, that commercial products preclude theoretical papers. But you must know that's not true. The reason there were no papers is that there was no theory. You don't feel that's important; I suggest that's one reason pre-relational systems were so inelegant.

One giant leap owed to Codd that I think was (and often still is) underappreciated is his adoption of value semantics. Your helpful citation illustrates that point quite well, see next.

> There are many terms that Codd uses in the RM, which have gone out of
> fashion.

It does take some work to read Codd's 1970 paper while trying to embrace the technological perspective of his audience in the days of punch cards and drum memory.

Looking at the example in section 1.4, I finally see what you mean by "hierarchy". And, fair enough, Codd says of Figure 3 {employee, jobhistory, salaryhistory children} ,"The tree ... shows just these interrelationships...." Having worked with the relational model all these years, I look at that diagram and I don't see a tree. I see an ancestor of a Chen diagram, and automatically assume the "nonsimple domains" are tables. To Codd's contemporaries, the tree-ness was obvious.

> when he gives the pre-requisties to his __Relational Normal Form__
> [1.4](1)(2), and in (1) states "collections of trees", we take that
> to mean:
> - trees with integrity
> - normalised to the extent that we did prior to the RM
> - no circular references
> - what I am calling, in retrospect __Hierarchical Normal Form__

Codd certainly knew that a tree is a kind of DAG. I don't know what "normalized [before] RM" refers to, and I wouldn't rely on this example to prove circular references can't exist in a relational database, but I don't want to argue that point just yet, because we're talking hierarchies. Requoting,

> - the set of such tables form an hierarchy of tables
> - such keys may well be called Hierarchical Keys

Sure, but only at a severe cost to meaning!

Codd's reader doubtless saw a hierarchy (effortlessly, as you do, as I did not). But the example shows that they are *not* a hierarchy, despite appearances. Figure 3(b) shows each relation (his term) having "man#" as part of the key. It is not necessary to go through jobhistory to get to salaryhistory. It is perfectly possible, as you know, to

	select birthyear, salary
	from salaryhistory as j join children as c on j.man# = c.man#

If that kind of access is possible, in what sense do the four tables form a hierarchy? Are we to say they have "hierarchical keys" simply because employee->jobhistory->salaryhistory are related through their foreign keys?

If that's what you mean, OK. Given that the tables don't have to be used hierarchically, ISTM that calling them a hierarchy is to adopt a blinkered view.

> To the extent that any hierarchy that exists in the data, is
> maintained as an hierarchy, after transformation to the Relational
> Model, the hierarchy lives, exists, breathes

No. What you're really saying is that the tables are related, and that their relationships are manifest in their keys.

The hierarchical systems you remember so well adopted the idea -- and required the schema to manifest the idea -- that e.g. jobhistory is a *property* of employee. (They didn't use that term, of course.) One could not access jobhistory records except through a *pointer* acquired through an employee record. The hierarchy wasn't just a notional (or notational) communication convention; it constituted the access path.

With that example, I really think the fairest thing to say is that it shows it's *not* a hierarchy. By adopting value semantics -- by making the keys values instead of pointers -- each relation becomes free-standing and self-consistent. We can think of them as forming a hierarchy as a convenience; perhaps they'll be commonly used that way in some application. But we're not required to. The new, non-hierarchical relations can be combined in arbitrary ways. We can find the highest salary for each year, without ever learning the men's names.

> I said the RM is a progression of the HM.

I suppose that's true, in the sense that the United States as constituted in 1789 was a progression of government from what had existed in 1775. Something came before the thing that came later, and many people would call it "progress".

The RM was also revolutionary in 1) using math as a foundation, and 2) rejecting the tree -- and with it, pointer semantics -- as the basis for data organization.

--jkl Received on Wed Feb 04 2015 - 06:45:25 CET

Original text of this message