Re: Bill of materials / Groups in Groups

From: Harry Chomsky <harryc_at_chomsky.net>
Date: 2000/01/20
Message-ID: <_fPh4.74$gj5.5492_at_nnrp1-w.snfc21.pbi.net>


joe_celko_at_my-deja.com wrote in message <867lpb$g4n$1_at_nnrp1.deja.com>...
>>> I've been trying to figure out what notion of "normalization" Mr.
>Celko was referring to when he made the comment about changing Chuck's
>id number. <<
>
>In a relational databse there is no such thing as an object identifier,
>which is how you wanted to treat the sequential row number you added to
>the original table. Everything can be updated if you can keep the
>constraints right.

I don't see how "object identifiers" enter into the picture here. I can design an adjacency model of employees and bosses using employee name as the primary key, or I can add a numeric id number attribute and use that as the primary key. It really doesn't make any difference to my arguments. Both designs use perfectly standard relational concepts. I think you must have misunderstood me, so let's just let this issue drop.

>>>Apparently not any standard notion based on FDs, ... <<
>
>This is Domain-Key Normal Form, which is a superset of the ones based
>on Functional Dependencies. Look up the name Fagin in the literature;
>I think the ACM has his original paper on their website or as a reprint
>for members. Halpin's semantic models (ORM, NIAM methods) are related,
>but slightly different and probably a little stronger still.

OK, I've refreshed my memory on Domain-Key Normal Form, as presented in Atzeni & DeAntonellis. I assume you are claiming that the adjacency version of the employees / bosses model is not in DKNF, while your version with lft and rgt attributes is. Let's see.

First of all, the definition: a relation is in DKNF if all of the integrity constraints for that relation can be deduced from a set of very simple constraints of the form "attributes { a1, a2, ..., an } form a key for the relation". Another way to state the definition is that you can always delete any tuple from the relation without making it inconsistent, and you can always insert any tuple unless that tuple shares a key with a tuple already in the table. It takes some work, of course, to make these definitions formal and to prove that they're equivalent, but I think they're good enough for this discussion.

Now, is the adjacency model in DKNF? Let's consider this schema:

CREATE TABLE Employee (
  name CHAR(10) PRIMARY KEY,
  boss CHAR(10),
  salary DECIMAL(6,2) NOT NULL);

There are two important integrity constraints: 1) "name" is a key for the relation, and 2) "boss" must either be null or match the "name" attribute of some row. The first constraint is simple; the second is not. The second is a form of "inclusion dependency", or more colloquially a foreign key constraint. So according to the definition I gave, the relation is not in DKNF. The obvious way to put it into DKNF is to decompose it into two relations:

CREATE TABLE Employee (
  name CHAR(10) PRIMARY KEY,
  salary DECIMAL(6,2) NOT NULL);

CREATE TABLE EmployeeBoss (
  emp CHAR(10) PRIMARY KEY,
  boss CHAR(10) NOT NULL);

Now each relation has only one constraint, its primary key constraint. There are other constraints, the inclusion dependencies stating that the set of "emp" values from EmployeeBoss is a subset of the set of "name" values from Employee, and likewise with "boss" values. But these are interrelational constraints, so (at least according to Atzeni & DeAntonellis) they don't enter into the question of DKNF. The schema is in DKNF because each relation is individually in DKNF. (If you have a different definition of DKNF that takes interrelational constraints into account, please fill me in.)

Do you acknowledge that this schema is in DKNF? If not, can you tell me what other intrarelational constraints the schema requires that cannot be expressed as key constraints?

At one point you objected that changing Chuck's name to Charles would require changing his name in several different rows and columns, and you used this as evidence that a schema was not normalized. Well, this schema exhibits the same problem, yet as far as I can tell it's perfectly normalized. I think that your comment about the difficulties in changing Chuck's name was a piece of practical database design advice rather than an argument about normalization.

Moving onward: now, what about your design using lft and rgt attributes? Is that in DKNF? Well, first we have to consider what its constraints are. That's actually a very difficult question. You don't describe the constraints explicitly in your presentation. From your comments about preferring integers rather than floats and your observation that "the number of nodes subordinate to node x can be computed by (rgt - lft +1)/2", I have to conclude that you're using a very strong set of constraints. Obviously each lft must be less than its corresponding rgt; no number may occur more than once in the lft and rgt attributes of all the tuples; the closed intervals [lft, rgt] for any two tuples must either be disjoint or one must be contained within the other; whenever lft < rgt - 1 there must be a tuple whose lft is lft + 1; and I'm sure there are other constraints along these lines. These are very interesting and complex constraints. The only way you can claim the relation is in DKNF is if you can reduce these constraints to a simple list of key constraints. But what key constraints does the relation satisfy? The primary key constraint, of course -- emp is a key. Also, lft is a key and rgt is a key. But knowing those three keys certainly doesn't enable one to deduce all the other complex constraints that I was starting to list. So this relation is not in DKNF.

Intuitively speaking, I'd say your model is much "farther away" from being in DKNF than my original adjacency was. More precisely, my original model had exactly one constraint that couldn't be expressed in simple terms; it was a straightforward constraint, and I was able to remove it by splitting the relation into two parts in a way that all database designers will find familiar. Your model has many non-simple constraints, and I'm having trouble even listing them all, let alone expressing them in terms of familiar database concepts. I can't imagine how you could divide up your relation into two parts that would each be in DKNF. Perhaps you could divide it into _three_ relations: (emp, salary), (emp, lft), and (emp, rgt). It may be that each of those is in DKNF; I'm not sure off-hand. But now how do you describe your complex constraints? They're interrelational, yes, but now they're even more complicated than they were to begin with.

Again, my point is not to claim that your model is _worse_ than the adjacency model. I'm simply saying that your arguments against the adjacency model were spurious, that each model has its advantages and disadvantages, and that the advantages and disadvantages cannot be described in terms of normalization. Received on Thu Jan 20 2000 - 00:00:00 CET

Original text of this message