Re: Hierarchical Model and its Relevance in the Relational Model

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Fri, 30 Jan 2015 03:29:45 -0800 (PST)
Message-ID: <255bf64a-928a-4287-9e1e-3bff36f676f2_at_googlegroups.com>


> On Thursday, 29 January 2015 13:09:08 UTC+11, James K. Lowden wrote:
> On Wed, 28 Jan 2015 01:52:59 -0800 (PST)
> Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

I will probably run out of time on this post, so if it feels cut off, I will complete it another day.

> Thank you. I accept the bait^W invitation. ;-)

Come, come, James. A square and direct face isn't bait. I don't ask you to, and I cannot be asked to, watch my every turn of phrase.

I am trying to deal with what I consider a serious issue in our field. I would appreciate it if your treat it likewise. This is not the banter or baseless attacks that is commonplace here.

> > A. ____the Hierarchical Model rests on a theoretical void____

> > B. ____the HM is dead, it has no relevance wrt the Relational Model___

F. The implication here, and in many other places, is that you know the Relational Model, and you know it well.

> > 1. I take issue with your proposal [A]. I charge that it is
> > completely and totally false. But that is not as important as the
> > implication [B], which is totally and completely false.
>
> Regarding [A], it was Codd's observation. He had to invent the term
> "hierarchical model" ex post facto to name the thing he was comparing
> the relational model to.

That is not correct.

I was alive and kicking in those days, and I had moved from kicking inflated rubber balls to kicking IMS off its high horse. I was an Engineer (it meant something different in those days) for Cincom, with TOTAL as our Network DBMS. About a third of us were mathematicians (not me of course). All of us were scientists. We modelled before we wrote a line of code. Codd didn't invent the term. The Hierarchical Model was a fact, and we modelled (we did help IMS customers see the light, yes). Unlike these days, we did not have six million meaningless models with everyone choosing their own notation, we had five, and everyone who practised their craft knew each of them intimately.

In terms of beating the competition, we knew their gear inside-out. If we walked into a customer site and we could not deal with their models directly, we would have had no credibility. Nowadays cartoons pass off as models. We didn't have software to draw models on PCs, but we had excellent hand-drawn models, a set of symbols, notation, rules, stencils, etc. given to us by authorities in our field.

I remember fondly, when my models were ready to be published, I would visit a certain customer in Rochester, who had a secret drawing system (later turned to be the Xerox Star system that preceded the Apple interface, which btw Windoze is a poor man's copy of), and in exchange for a few hours consulting I would use the XDS to draw my model and publish it with a level of precision that surpassed my otherwise hand-drawn models.

Even in the loose sense (as per your comments), even looking backwards to those days, the HM as a model, to compare against the NM or the RM, is an easily understood, and much used, concept.

> It's not even a "model", though, insofar as
> it has no mathematical foundation.

Hah!

So what.

There are millions of models in various fields of scientific endeavour that do not have a mathematical foundation. Eg. In those days we used, and many people still use, Gane and Sarsons SSADM (not the abortion it has become since others have been "taking care" of it), commonly known as Data Flow Diagrams. It has a scientific, theoretical foundation but not a mathematical one. UML doesn't have an equivalent. IDEF0 is the equivalent on the IDEF side.

We had both the HM and the NM as models, with a set notation, and they certainly had a scientific and theoretical basis. Every incremental version showed proof of it. We had mathematicians on board, but I never saw their proofs, even though they worked with us. It was enough for the guy who specialised in this or that aspect of the resource (affecting the codeline), that he said he had, or had not ,researched something, and therefore he stands on, or doesn't, what he is saying. Credible people do the same these days. They don't have to be mathematicians.

Computers were orders of magnitude more expensive then, the codeline was precious in those days. We had a complete DBMS with OLTP in an 8-bit machine. By the 80' we had graduated to 16 bits, Britton-Lee and Tandem NonStop. We had moved from nipping at IBMs heels (they *were* the market) to crawling up their shins. Only complete idiots, who worked alone, coded without modelling first.

> He was never so inartful as to put it in so many words, but the
> argument for the relational model might be boiled down to, "Look,
> here's math, and there's a set of commonly accepted engineering
> practices. Which is a better foundation?"

I have no argument with that. As long as the previous historical facts are not denied.

> The RM has an algebra and a calculus, as you know. What is the
> equivalent (or even analog) in the hierarchical model?

Who cares. (Refer my previous comments.)

Analogue is graph theory of course.

The answer to your question does not prove anything, so I will drop it.

> There is no
> "graph algebra", no set of operations closed over the domain. Graph
> theory offers no sets, bears no connection to predicate logic.

So what ? A scientific person can, look at a graph, a tree, and instantly determine that it has integrity or not. What theory is he practising ? What algebra is he using ?

I can look at a data model, and using the small knowledge of graph theory that I have, instantly identify precisely where the bottlenecks will be, where the lock contention will be. And it is always confirmed when the system goes into production. I have no algebra or calculus, and only a pittance of graph theory.

Ok, fair enough, a mathematician might have to sit down and produce a new algebra, and a calculus to work with it, and a year or three later, he might have a proof that the graph has integrity. The world passed the poor sod by.

> What we have in graph theory is a taxonomy and a giant collection of
> algorithms. I would say that puts it about where biology was before
> germ theory: a museum of classification and observed behavior.

Well, that does not compute with my non-mathematician's research on the subject (let's say I have invented two things, relevant to our field, and I did some research, of course not formal, in that field).

But I certainly do not know enough to argue with you on that point.

I simply disagree. Eg. we can compute whether a graph, the point sets of which exist in a Relational Database, has certain properties or not, fairly quickly, without needing a Cray or a mathematician. So be it a museum or whatever, it is far more accessible and use-able (by non-mathematicians) than the stuffed skins of non-existent animals that we have in our database museum. I have yet to see a paper in our field, that scientifically observes behaviour, all I have seen is papers that focus on skin and hair (microscopically of course) of museum pieces.

Please feel free to change that.

Take a look at Norbert's papers about his field (Architectural Spaces). AFAIC, they are excellent, they put the papers in the RDB field to shame. Well written, well structured, good explanations, and finally a mathematical proof. All we have over here is speculation, in isolation/denial of other sciences, followed by a short proof.

Unfortunately, some poor sod uses it to code a system somewhere.


Therefore I say, the notion that the Hierarchical Model rests on a theoretical void, is without merit, and in denial of historical facts. The HM and the NM had science and theory behind it, as well as all the articles required for modelling (just not PCs with drawing tools).

(It may well not have a mathematical proof, yes.)


> Regarding [B], I suppose it depends on what we mean by "dead" and
> "relevance".

Well, you said it (I paraphrased), so it is up to you to tell us what you meant, or to modulate or correct it in some way.

> > I declare, the Relational Model is not a replacement for, or a
> > substitution for the Hierarchical Model; it is a progression of it.
>
> Evidence, please. I can think of no way in which the hierarchical
> model informs the relational model except as antithesis. Codd
> contrasted the RM with "noninferential systems", which hardly
> sounds like a source of inspiration.

(Not avoiding this point, tomorrow, please.)

> I think you will recognize this, from the abstract:
>
> "Existing noninferential, formatted data systems provide users
> with tree-structured files or slightly more general network models of
> the data. In Section 1, inadequacies of these models are discussed."

Accepted. And both of us know that the abstract is not the paper. Details tomorrow.

> (I hereby propose that on c.t.d. we refer to this source as RMDLSBD.)

NO.

I am not saying that you are being dishonest. But that is exactly the kind of madness that you guys perform, that we have to swallow, and I am not swallowing it.

The database world (separate to the tiny fraction of mathematicians concerned with the RM, which is say 1%) knows the RM by the term RM. It consists of Codd's original paper, easily accessible, plus his 11 other papers, noting some are exploratory and retracted, some are commercial interest, etc. It is a body of work that is well known. It contains a relational algebra, and later a calculus. Vendors have spent thousands of manhours and implementers have spent millions of manhours, implementing that thing that is known as the RM. End of story.

It is therefore unacceptable that a mathematician, in the typical way of mathematicians in this poorly-served field, attempts to carve it up, or to call one of their 42 relational algebras THE relational algebra, or to call whatever thing they refer to as "the relational model", THE Relational Model. Although you have exposed it by another route, this is precisely the problem I am trying to deal with.

You made your comments about the RM, in a public forum about Relational Databases. I am confronting those comments. You did not say, the body of papers that the mathematicians call the "relational model". If you did not mean the RM when you made those comments, please just say so and retract them.

This is also relevant to [F].

> Is the hierarchical model relevant? Not to me. Dead? Not dead
> enough.
>
> It never dies because it never lived.

That is silly, plainly false. I don't think that you would argue that it did live, at least until 1984, when the Relational model took off. (I think we can dismiss the IMS and Network die-hards, who are most certainly alive today, as "not relevant, not worthy of discussion".)

> It's the zombie of database
> theory,

Hang on, your said it had no theory. You are contradicting yourself. I am not being silly here, not playing with words, think about this. How can something that is devoid of theory be a threat to something that has allegedly well-established theory ??? It is not possible. If anything, it (your comments) makes a statement about the vulnerability of what you call database theory.

It is quite different from where I sit. First I have to say, what you and I call "database theory" are two vastly different things, I think you know that. I reject 90% of the papers written on the subject. That is not as important as this: the 10% left standing, are invulnerable.

Any new theory is not going to do any damage to it. Iff it is proved (forget "well-received"), *and* it has a scientific basis (forget published as a textbook), then it ADDS to that body. Otherwise, it gets the same bin as the 90%.

Further, in that standing and invulnerable 10% of database theory, the Hierarchical <model|theory|whatever> has a place, it is already integrated.

(I did not say it is written up in theoretical papers or that it has a mathematical proof, something that would be exceedingly silly to demand, because it is an evidenced fact. That would be like saying, show me a mathematical proof that your legs work, before I will believe it. I would have to be in a state of denial that *my* legs work, get the men with the white coats.)

> preying on the unsuspecting and eating their brains. IBM still
> sells IMS, presumably to someone, presumably at a profit.

Cincom still sells TOTAL and MANTIS. I agree, that is an irrelevant market.

> And there's a
> cadre of Facebook-wannabes who think "graph databases" are the cat's
> meow for finding out who's within 6 degrees of separation from Kevin
> Bacon. It would seem that once you introduce the average programmer to
> a hierarchical filesystem, you can never wean him of the notion that
> that's the "natural" structure for data.

I agree, applying hierachical <fill-in-substitute-word-for-model> to everything would be a serious mistake.

Ok. So then what, in your considered opinion, is the natural or "natural" structure for data, or is there none ? (I think we both agree, it is not network.)

(I am not selling hierarchical, I am just not in denial of it. I just don't have the fear and loathing that your emotional comments demonstrate.)

> Returning to your points,
>
> > A. ____the Hierarchical Model rests on a theoretical void____
> > B. ____the HM is dead, it has no relevance wrt the Relational Model___
>
> No theory informs the so-called hierarchical model.

Dismissed.

> It is in every way
> irrelevant as a theoretical database construct.

Considering my comments above, if not my declarations, definitely not.

The most I can accept from your statement is, It is in every way irrelevant as a theoretical database construct at the current state of play amongst the mathematicians in the field of database theory, which field is very poorly served. Not the scientist, not the theorists, just the mathematicians.

And the evidence is, yet again, they are demonstrably scared of it. For un-scientific reasons.

> It is in every way
> irrelevant as a theoretical database construct.

That you people think so, might be the crux of the problem in our industry.

> That doesn't prevent
> many naïve people from believing it has something to offer. Every lousy
> idea has its ardent supporters.

(I am not a supporter, just a non-denier.)

> In trying to fathom why you raised the subject (you being an advocate
> of the relational model, I would say)

Definitely. Codd, only Codd, and nothing but Codd, so help me God!

> I noticed:
>
> > [Implementation of Data Hierachies is] simple for me,
> > difficult-to-impossible for those who are victims of the suppression
> > of Hierarchies,
>
> among whom I'm included. :-(

Excuse me, I did not include or exclude you, the determination is open. If and when I see your work, then a determination can be made.

> I think we could say that assertion is not even wrong.
>
> As you well know, tables can represent graphs, ergo hierarchies. On a
> few occassions I have implemented them. For example, a SQLite virtual
> table representing a directory hierarchy
> (http://www.schemamania.org/sql/sqlite/udf/).

(Although I only have time for skimming it for now, that looks like a seriously good contribution. I would be interested in a data model of the base table that supplies the virtual table, in return, I will show you mine.)

> Now that SQLite supports
> recursive queries, you can implement find(1) in SQL, if you like.
>
> So I would appreciate it if you would exclude me from the set of people
> for whom the concept is "difficult-to-impossible". Please do count me
> among those who usually find it needless, though.

Two answers.

First, yes, you do not look like you should be included in that category of victims of the suppression of hierarchies, those who find it difficult-to-impossible to implement hierarchies. And I would very much like to make the determination, and thus a declaration to that effect. The sticking point, as I am sure you will appreciate, is that you have made the statements that you have about hierarchies; the hierarchical model; its place in the RM. Note your emotive comments. I don't have to be a psychologist to figure out that that would leave you indisposed (at best, and I won't mention the worst) to implementing one, properly, correctly, in a Relational database. More later.

Second.

Whoa. But, but, but. Wait a minute. Hang on.

Given that you have stated what you did about the hierarchies, and the hierarchical <model-or-whatever-word-you-use>, and that it is devoid of theory (not playing on words), what theory then, what seed, what concept, did you use when you implemented that directory ?

I did not let you fracture and split the RM (as the world knows it), and it may be that I should not let you carve up and quarter the HM, so that you can agree with some parts and disagree with others, but I will go with your quartering for now.

Where did you get that hierarchical concept from, for the directory, if not from the HM ?

Do you know that Oracle's Clustered Tables are a straight-forward implementation of the HM. Ie. the HM, not the Hierarchical concept, not an Oracle private definition, but the HM, pure and simple. Complete with pointers, and co-location of mixed data, non-tabular.

So here is a small task for you that I think will demonstrate many of the issues that we are dealing with in this thread. Given your demonstrated skill in the SQLite link and the RM (forget the "rm" of the mathematicians), I think this would take 30 mins or less.

Produce a data model for a set of tables that is fully compliant with the RM, ie, completely Relational, for the storage of data pertaining to:

- Countries (Name, FullName, FederationDate, ExpiredDate)
- State/Province/Territory (Name, FullName, Type)
- County (Name, Type)
- Town/Township/Metropolis (Name, Type)
- Suburb (StreetName(FK), StreetType(FK) )

For all except the last two, the keys have been left out, the exercise is to determine and decide the keys. Everyone knows that there are ISO or ANSI codes for identifiers, for some of them, just choose good keys from that. Whatever you find as identifiers from a recognised standards body (eg. US ANSI County Codes) is deemed part of the "data".

Feel free to choose meaningful keys for the remainder. In case it needs to be said, the exercise is not about spending time finding info: if you have any trouble at all, please ask, and I will supply.

The data model can be in either the IDEF1X standard that practitioners have been using since 1985, or the text that mathematicians still use in the museum, in SQL DDL. The former is much faster. UML is somewhere in the middle, nowhere near as rich or specif as IDEF1X, but better, sort of than DDL.

  • End Task ====

> Representing a hierarchy is one thing, and basing a DBMS on
> data-as-hierarchy something else.

Er, I did not suggest, or recommend, "basing a DBMS on data-as-hierarchy". Of course, as you seem to agree, data sometimes exists in an hierarchy.

> Hierarchies exist. The hierarchical model does not.

Well that is still up in the air, and as long as we proceed as we have, that will be resolved soon. The HM vs "the concept of an Hierarchy" may have to be separated.

>
> --jkl

Question for you. In one para or less, what is your considered opinion of the Alice book.

In one sentence or less, how relevant is it to the field of Relational Database design.

Notice, I did not qualify that question with the term "mathematician" one way or the other.

Please do not avoid [F].

I will not forget two unanswered items, tomorrow.

Cheers
Derek Received on Fri Jan 30 2015 - 12:29:45 CET

Original text of this message