Re: Tree structure in Relational DB design

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 10 Jun 2002 20:13:07 -0400
Message-ID: <EhbN8.28$uu5.6232157_at_radon.golden.net>


> >> Second, Mr. Celko falsely claims that the adjacency list model is
> denormalized "in several ways". That's plain non-sense. It would only
> be true, if we could drop all the definitions of normal forms that are
> well established in database theory books, and adhere to his fuzzy
> ideas of what constitutes "normalized" . <<
>
> Let's go back to Dr. Codd and the Basics and prove it. A table is
> denormalized when it has redundancies and anomalies.
>
> Non-redundant = one fact, one place, one time
>
> Anomaly = creation of false information, deletion of a fact, or the
> inability to record a fact in the database.
>
> Now let's run a test:
>
> CREATE TABLE OrgChart
> (emp CHAR(10) NOT NULL PRIMARY KEY,
> boss CHAR(10)
> REFERENCES OrgChart(emp));
>
> OrgChart
> emp boss
> ===================
> 'Albert' 'NULL'
> 'Bert' 'Albert'
> 'Chuck' 'Albert'
> 'Donna' 'Chuck'
> 'Eddie' 'Chuck'
> 'Fred' 'Chuck'
>
> 1) 'Chuck' wants to change his name to 'Charles', but he appears in
> four places. If I update 'Chuck' only in the employee column then
> 'Donna', 'Eddie' and 'Fred' are reporting to a false boss value --
> anomaly!

Wrong. The foreign key constraint declared in the create table will prevent any anomaly. This is, in any case, a straw man devised by choosing an unstable key.

> 2) If I update 'Chuck' only in the boss column then 'Donna', 'Eddie'
> and 'Fred' are still reporting to a false boss value -- anomaly!

Again, the foreign key constraint will prevent any anomaly. Again, a straw man. The fact that a foreign key constraint, rather than a more complex general constraint, suffices to enforce integrity is a good indication that the design IS normalized. See Fabian Pascal's recent article in DM Review: http://dmreview.com/master.cfm?NavID=193&EdID=5251

> 3) So I have to update 'Chuck' in both employee and boss column -- a
> redundancy which leads to an anomaly!

The redundancy is necessary for the foreign key and causes no update anomalies. For a self-proclaimed database expert, you have a very poor grasp of the most basic concepts.

> >> I'd be very curious to see if Mr. Celko can demonstrate, if he will
> or if he can at all, what normal form, as defined in standard database
> theory books <<

As you may notice, Mr. Celko failed to indicate what normal form, if any, the above violated. Of course, he cannot because the design above is already in fifth normal form. Again, what he did post was incomplete and at best a straw man.

> >> ... and more so because usually a hierarchy has an existence that
> is separate from the objects that occupy the nodes of the hierarchy.
> In this example the hierarchy of positions in the enterprise is
> separate from the persons that occupy positions at any given time. <<
>
> Agreed -- a table is a set of entities or relationship, but not both.

One man's entity (employment_contract, for instance) is another man's relationship (employs / is employed by). The distinction is arbitrary and essentially meaningless.

> But you are ignoring the other problem; the adjacency list model is a
> tree, not a hierarchy. Subordination is not inherited in the
> adjacency list model. Yet inheritance is the defining characteristic
> of a hierarchy.

Huh? A unique parent is the defining characteristic of a hierarchy. General inheritance graphs are DAGs, I believe.

> >> And further more it kind of breaks Dr. Codd's information principle
> that every information should be represented explicitly as a value in
> a particular column of a particular row. To account for this principle
> the hierarchical relation would then have to be represented explicitly
> only in the adjacency list model (the Nested Sets model fails as
> well), but in practice since we're not dealing with relational
> databases anyway, we have to adjust to what we have at hand. <<
>
> No, the use of (lft, rgt) pairs for a scalar value is fine in the
> relational model; SQL has to use two columns right now instead of
> allowing you to create a datatype or domain for this.

Costin's point has nothing to do with whether the lft and rgt are scalars. The lft and rgt do not directly identify either an ancestor or a decendant. As a result, the important facts are not stated explicitly but must be inferred. At best, your suggestion is questionable.

> >> The theoretically "right" solution is to use the adjacency list
> mode, and hope that RDBMSs will eventually come up with the right
> solution not only to express a recursive query (which can be done
> currently), but also to index and optimize for speedy retrieval of
> such queries. <<
>
> The adjacency list model is based on graph theory and the relational
> model is based on sets.

Data are data. An RDBMS must represent them to the user as values in sets, but data values, themselves, need not be sets. When modelling a problem, one ought to use whatever sound theory one has available. In this case, graph theory. Using graph theory to model ones data in relations does not break anything any more than using numeric algebra to model the numbers in one's database does. Received on Tue Jun 11 2002 - 02:13:07 CEST

Original text of this message