Re: How to 'normalise' this scenario

From: Frank Millman <frank_at_chagford.com>
Date: Sun, 15 May 2011 23:50:52 -0700 (PDT)
Message-ID: <67c14d64-9441-44e0-8f97-7021bf01336f_at_z13g2000yqg.googlegroups.com>


On May 15, 3:02 pm, -CELKO- <jcelko..._at_earthlink.net> wrote:
> Since SQL is a set oriented language, this is a better model than the
> usual adjacency list approach you see in most text books. Let us
> define a simple OrgChart table like this.
>
>    CREATE TABLE OrgChart
>   (emp_name CHAR(10) NOT NULL PRIMARY KEY,
>   lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
>   rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
>   CONSTRAINT order_okay CHECK (lft < rgt));
>
>   OrgChart
>   emp_name lft rgt
>   ======================
>   'Albert' 1 12
>   'Bert' 2 3
>   'Chuck' 4 11
>   'Donna' 5 6
>   'Eddie' 7 8
>   'Fred' 9 10
>
>   The (lft, rgt) pairs are like tags in a mark-up language, or parens
> in algebra, BEGIN-END blocks in Algol-family programming languages,
> etc. -- they bracket a sub-set. This is a set-oriented approach to
> trees in a set-oriented language. Technically, there ought to be a
> separate personnel and organization table, but bear with me,
>
>  The organizational chart would look like this as a directed graph:
>
>   Albert (1, 12)
>    /             \
>   /               \
>   Bert (2, 3)  Chuck (4, 11)
>            /                 \
>           /            |       \
>          /             |        \
>         /              |          \
>
>   Donna (5, 6) Eddie (7, 8) Fred (9, 10)
>
>   The adjacency list table is denormalized in several ways. We are
> modeling both the Personnel and the Organizational chart in one table.
> But for the sake of saving space, pretend that the names are job
> titles and that we have another table which describes the Personnel
> that hold those positions.
>
>   Another problem with the adjacency list model is that the
> boss_emp_name and employee columns are the same kind of thing (i.e.
> identifiers of personnel), and therefore should be shown in only one
> column in a normalized table. To prove that this is not normalized,
> assume that "Chuck" changes his name to "Charles"; you have to change
> his name in both columns and several places. The defining
> characteristic of a normalized table is that you have one fact, one
> place, one time.
>
>   The final problem is that the adjacency list model does not model
> subordination. Authority flows downhill in a hierarchy, but If I fire
> Chuck, I disconnect all of his subordinates from Albert. There are
> situations (i.e. water pipes) where this is true, but that is not the
> expected situation in this case.
>
>   To show a tree as nested sets, replace the nodes with ovals, and
> then nest subordinate ovals inside each other. The root will be the
> largest oval and will contain every other node. The leaf nodes will be
> the innermost ovals with nothing else inside them and the nesting will
> show the hierarchical relationship. The (lft, rgt) columns (I cannot
> use the reserved words LEFT and RIGHT in SQL) are what show the
> nesting. This is like XML, HTML or parentheses.
>
>   At this point, the boss_emp_name column is both redundant and
> denormalized, so it can be dropped. Also, note that the tree structure
> can be kept in one table and all the information about a node can be
> put in a second table and they can be joined on employee number for
> queries.
>
>   To convert the graph into a nested sets model think of a little worm
> crawling along the tree. The worm starts at the top, the root, makes a
> complete trip around the tree. When he comes to a node, he puts a
> number in the cell on the side that he is visiting and increments his
> counter. Each node will get two numbers, one of the right side and one
> for the left. Computer Science majors will recognize this as a
> modified preorder tree traversal algorithm. Finally, drop the unneeded
> OrgChart.boss_emp_name column which used to represent the edges of a
> graph.
>
>   This has some predictable results that we can use for building
> queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
> FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
> are defined by the BETWEEN predicate; etc. Here are two common queries
> which can be used to build others:
>
>  1. An employee and all their Supervisors, no matter how deep the
> tree.
>
>   SELECT O2.*
>   FROM OrgChart AS O1, OrgChart AS O2
>   WHERE O1.lft BETWEEN O2.lft AND O2.rgt
>   AND O1.emp_name = :myemployee;
>
>    2. The employee and all their subordinates. There is a nice
> symmetry here.
>
>   SELECT O1.*
>   FROM OrgChart AS O1, OrgChart AS O2
>   WHERE O1.lft BETWEEN O2.lft AND O2.rgt
>   AND O2.emp_name = :myemployee;
>
>   3. Add a GROUP BY and aggregate functions to these basic queries and
> you have hierarchical reports. For example, the total salaries which
> each employee controls:
>
>    SELECT O2.emp_name, SUM(S1.salary_amt)
>   FROM OrgChart AS O1, OrgChart AS O2,
>   Salaries AS S1
>   WHERE O1.lft BETWEEN O2.lft AND O2.rgt
>   AND O1.emp_name = S1.emp_name
>   GROUP BY O2.emp_name;
>
>    4. To find the level of each emp_name, so you can print the tree as
> an indented listing.
>
>   SELECT T1.node,
>   SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
>   + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
>   FROM Tree AS T1, Tree AS T2
>   WHERE T2.lft <= T1.lft
>   GROUP BY T1.node;
>
>   An untested version of this using OLAP functions might be better
> able to use the ordering.
>
>   SELECT T1.node,
>   SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
>   + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
>   OVER (ORDER BY T1.lft
>   RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
>   FROM Tree AS T1, Tree AS T2
>   WHERE T2.lft <= T1.lft;
>
>   5. The nested set model has an implied ordering of siblings which
> the adjacency list model does not. To insert a new node, G1, under
> part G. We can insert one node at a time like this:
>
>    BEGIN ATOMIC
>   DECLARE rightmost_spread INTEGER;
>   SET rightmost_spread
>   = (SELECT rgt
>   FROM Frammis
>   WHERE part = 'G');
>   UPDATE Frammis
>   SET lft = CASE WHEN lft > rightmost_spread
>   THEN lft + 2
>   ELSE lft END,
>   rgt = CASE WHEN rgt >= rightmost_spread
>   THEN rgt + 2
>   ELSE rgt END
>   WHERE rgt >= rightmost_spread;
>
>   INSERT INTO Frammis (part, lft, rgt)
>   VALUES ('G1', rightmost_spread, (rightmost_spread + 1));
>
>   COMMIT WORK;
>   END;
>
>   The idea is to spread the (lft, rgt) numbers after the youngest
> child of the parent, G in this case, over by two to make room for the
> new addition, G1. This procedure will add the new node to the
> rightmost child position, which helps to preserve the idea of an age
> order among the siblings.
>
>   6. To convert a nested sets model into an adjacency list model:
>
>   SELECT B.emp_name AS boss_emp_name, E.emp_name
>   FROM OrgChart AS E
>   LEFT OUTER JOIN
>   OrgChart AS B
>   ON B.lft
>   = (SELECT MAX(lft)
>   FROM OrgChart AS S
>   WHERE E.lft > S.lft
>   AND E.lft < S.rgt);
>
>   7. To find the immediate parent of a node:
>
>   SELECT MAX(P2.lft), MIN(P2.rgt)
>   FROM Personnel AS P1, Personnel AS P2
>   WHERE P1.lft BETWEEN P2.lft AND P2.rgt
>   AND P1.emp_name = :my_emp_name;
>
>   I have a book on TREES & HIERARCHIES IN SQL which you can get at
> Amazon.com right now.

Thanks for the reply, Joe.

Unless I am missing something, you have simply reproduced the 'nested set' theory that I have read in a number of other places on the web. You do not seem to have addressed the particular issue that I raised in my original post.

While on the subject, here is a question that has always puzzled me. In your example of inserting a new node, you have the following -

  rgt = CASE WHEN rgt >= rightmost_spread     THEN rgt + 2
    ELSE rgt END
  WHERE rgt >= rightmost_spread;

As the WHERE clause restricts the selection, why do you need the CASE statement at all. In my experiments I have changed this to

  rgt = rgt + 2

and it seems to work ok. Is there a problem with this?

Thanks

Frank Received on Mon May 16 2011 - 08:50:52 CEST

Original text of this message