Re: Bill of materials / Groups in Groups

From: <joe_celko_at_my-deja.com>
Date: 2000/01/13
Message-ID: <85l2o8$8ri$1_at_nnrp1.deja.com>#1/1


>> Now I'm very confused. What definition of normalization applies
here? Are we talking about 3NF or BCNF or some other mathematical definition? Or is it just the intuitive idea that "each fact should be stored in only one place"? <<

Actually that is a cute phrase to define normalization -- the more formal version is that inserts, updates and deletes do not cause anomalys in the database. That is, the database is always in a valid and internally consistent state at the end of a transaction. 1NF to 5NF, BCNF, DKNF, et al are _kinds_ of normal forms which prevent certain kinds of anomalys.

>>Do you consider the following set of tables to be normalized? <<

No, but almost. You need constraints to get this to work. You also allow too many NULLs. I would use collective or plural nouns for table names, since they are sets and not singular data elements. Try this:

 CREATE TABLE Customers
 (customer_id INTEGER NOT NULL PRIMARY KEY,   cust-name CHAR(30) NOT NULL);

 CREATE TABLE Products
 (product_id INTEGER NOT NULL PRIMARY KEY,   description CHAR(50)NOT NULL);

 CREATE TABLE Orders
 (order_id INTEGER NOT NULL PRIMARY KEY,   customer_id INTEGER NOT NULL

              REFERENCES Customers(customer_id)
              ON DELETE CASCADE
              ON UPDATE CASCADE,

  order_date DATE NOT NULL,
  product_id INTEGER NOT NULL
             REFERENCES Products(product_id)
              ON DELETE CASCADE
              ON UPDATE CASCADE,
  quantity INTEGER NOT NULL
            CHECK (quantity > 0));

>> In this [origianl version] database, if I change the ID number of a
customer or a product, I may have to cascade the changes through many rows of the Order table. So in some sense the database fails to "store each fact in only one place". But if you conclude from this that the database isn't normalized, then you must be using a concept of normalization that I'm unfamiliar with. <<

Exactly! That is why you need the constraints; Look up Domain-Key Normal Form, NIAM and ORM methods. You had a database where I could take an order from a fictional customer for a non-existent product (insertion anomaly and update anomaly, but no deletion anomaly). The keys for customers and products were being violated. You could have a quantity less than zero -- probably not real easy to ship -- which is a value that is not in the domain of that attribute.

>> Getting back to the employee/boss database... Consider the standard
formulation, where the Employee table has a Boss attribute [adjacency list model]... I don't understand why you say it's not normalized. <<

There is nothing wrong, per se, with a primary and foreign key in the same table. In fact, the FIPS-127 SQL Test suite had a test for this. That is not the problem. When I delete Chuck, I destroy the fact that the big boss is still the superior of all of Chuck's subordinates (deletion anomaly -- I destroyed a fact). The nested sets model preserves that fact. If I want to create a new position in the company, I cannot do it until I have a person to fill it (insertion anomaly).

>> I suspect you really want the lft and rgt attributes to have a
continuous datatype such as float or fraction, rather than integer. If you use integers, then adding a new node to the tree may require a complex and expensive series of cascaded changes to the rest of the tree. This would be a serious violation of the intuitive idea of "normalization" that you seem to be working with. But with fractions you can probably avoid this difficulty. <<

Nope, I want integers, altho any numeric datatype will work. Integers are small and fast to work with. A lot of the good tricks with the nested sets model depend on the fact that the number of nodes subordinate to node x can be computed by (rgt - lft +1)/2.

Changing the structure of the tree requires ONE insert or delete statement to add or remove a subtree, followed by ONE update statement to recompute the lft and rgt values. They are a little messy, so I would put them into a stored procedure.

Remember that I am working with SETS and do not have to do row at a time processing. You are still thinking in terms of navigation of the graph in the adjacency list model.

--CELKO-- Sent via Deja.com http://www.deja.com/
Before you buy. Received on Thu Jan 13 2000 - 00:00:00 CET

Original text of this message