Re: Quote of the Week

From: --CELKO-- <jcelko212_at_earthlink.net>
Date: 19 May 2004 09:20:07 -0700
Message-ID: <18c7b3c2.0405190820.26a7d1e5_at_posting.google.com>


>> It's a very ingenious method, but the thing that worries me a bit
is that you're basically creating a list and specifiying an arbitrary order ..You have two different relations representing *identical* real-world situations. <<

Not always. The order of the siblings is often very important. In an organization with a union seniority system, people live and die for such placement :)

>> How would you compare two of these organization structure relations
for equality? <<

Pick a canonical form, such as alphabetizing the siblings and enforce it. You can insert new siblings at any position easily. If you need to bring the whole tree into sorted order, convert it to an adjacency list in one query:

SELECT B.emp AS boss, E.emp
  FROM OrgChart AS E

       LEFT OUTER JOIN
       OrgChart AS B
       ON B.lft
          = (SELECT MAX(lft)
               FROM OrgChart AS S
              WHERE E.lft > S.lft
                AND E.lft < S.rgt);

Then re-build the tree with a push-down stack algorithm.

Another advantage is that I can take the (lft,rgt) pairs from one tree or subtree, put them in a temp table, and renumber them with (lft-MIN(lft),rgt-MIN(lft) and do the same with a second tree or subtree. If the two tables are identical, then the **structure** is the same, but not necessarily the content of the nodes. Basically, all units in an Army have the same job titles, ranks and number of troops, but different people fill the positions.

Try doing that with an adjacency list; since siblings are not ordered, it gets to be a really ugly order of complexity.

>> It seems to me that the adjacency list model is the elegant
solution to
storing the data because it doesn't require adding extra artificial items of data. <<

I disagree. You need some really elaborate constraints to prevent cycles and maintain connections. Aggregations and other traversals are procedural, not relational. The adjacency list model is the only solution I have found for a general graph, however.

>> But *any* query requires a procedural solution at the lowest level.
The point is that a DBMS can hide the procedural code internally so the user has only to use declarative code. <<

No, it doesn't. Classic procedural code is sequential in nature; set-oriented processing is parallel in nature.

Let's imagine a machine that follows Pournelle's Law and has one processor for each row in a table -- really massively parallel. Given a particular employee in a nested sets model org chart, I can apply a simple characteristic function to all the nodes at once to find his superiors, no matter how deep the tree.

SELECT O2.*
   FROM OrgChart AS O1, OrgChart AS O2
  WHERE O1.lft BETWEEN O2.lft AND O2.rgt     AND O1.emp = :myemployee;

Likewise, if I have a Teradata style hashing machine, I can eliminate entire buckets (subsets) of rows with a predicate, while narrowing my search to other buckets, all in parallel. It takes at most two probes to locate any row in this hardware architecture.

>> Maybe there would be other advantages in "hiding" procedural code
in a transitive closure operator. There might be an algebra of "relational + tclose" that could usefully simplify expressions for optimization. <<

That is what Oracle did in their proprietary CONNECT BY extension. The nested sets model out performs this extension in Oracle. I am not sure that procedural code is the way to better optimization. By its very nature, it is sequential execution, while non-procedural code is parallel; ain't nothing faster than simultaneous. Received on Wed May 19 2004 - 18:20:07 CEST

Original text of this message