Re: How to 'normalise' this scenario

From: Frank Millman <frank_at_chagford.com>
Date: Sun, 15 May 2011 23:50:52 -0700 (PDT)
Message-ID: <67c14d64-9441-44e0-8f97-7021bf01336f_at_z13g2000yqg.googlegroups.com>

On May 15, 3:02�pm, -CELKO- <jcelko..._at_earthlink.net> wrote:
> Since SQL is a set oriented language, this is a better model than the
> usual adjacency list approach you see in most text books. Let us
> define a simple OrgChart table like this.
>
> � �CREATE TABLE OrgChart
> � (emp_name CHAR(10) NOT NULL PRIMARY KEY,
> � lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
> � rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
> � CONSTRAINT order_okay CHECK (lft < rgt));
>
> � OrgChart
> � emp_name lft rgt
> � ======================
> � 'Albert' 1 12
> � 'Bert' 2 3
> � 'Chuck' 4 11
> � 'Donna' 5 6
> � 'Eddie' 7 8
> � 'Fred' 9 10
>
> � The (lft, rgt) pairs are like tags in a mark-up language, or parens
> in algebra, BEGIN-END blocks in Algol-family programming languages,
> etc. -- they bracket a sub-set. This is a set-oriented approach to
> trees in a set-oriented language. Technically, there ought to be a
> separate personnel and organization table, but bear with me,
>
> �The organizational chart would look like this as a directed graph:
>
> � Albert (1, 12)
> � �/ � � � � � � \
> � / � � � � � � � \
> � Bert (2, 3) �Chuck (4, 11)
> � � � � � �/ � � � � � � � � \
> � � � � � / � � � � � �| � � � \
> � � � � �/ � � � � � � | � � � �\
> � � � � / � � � � � � �| � � � � �\
>
> � Donna (5, 6) Eddie (7, 8) Fred (9, 10)
>
> � The adjacency list table is denormalized in several ways. We are
> modeling both the Personnel and the Organizational chart in one table.
> But for the sake of saving space, pretend that the names are job
> titles and that we have another table which describes the Personnel
> that hold those positions.
>
> � Another problem with the adjacency list model is that the
> boss_emp_name and employee columns are the same kind of thing (i.e.
> identifiers of personnel), and therefore should be shown in only one
> column in a normalized table. To prove that this is not normalized,
> assume that "Chuck" changes his name to "Charles"; you have to change
> his name in both columns and several places. The defining
> characteristic of a normalized table is that you have one fact, one
> place, one time.
>
> � The final problem is that the adjacency list model does not model
> subordination. Authority flows downhill in a hierarchy, but If I fire
> Chuck, I disconnect all of his subordinates from Albert. There are
> situations (i.e. water pipes) where this is true, but that is not the
> expected situation in this case.
>
> � To show a tree as nested sets, replace the nodes with ovals, and
> then nest subordinate ovals inside each other. The root will be the
> largest oval and will contain every other node. The leaf nodes will be
> the innermost ovals with nothing else inside them and the nesting will
> show the hierarchical relationship. The (lft, rgt) columns (I cannot
> use the reserved words LEFT and RIGHT in SQL) are what show the
> nesting. This is like XML, HTML or parentheses.
>
> � At this point, the boss_emp_name column is both redundant and
> denormalized, so it can be dropped. Also, note that the tree structure
> can be kept in one table and all the information about a node can be
> put in a second table and they can be joined on employee number for
> queries.
>
> � To convert the graph into a nested sets model think of a little worm
> crawling along the tree. The worm starts at the top, the root, makes a
> complete trip around the tree. When he comes to a node, he puts a
> number in the cell on the side that he is visiting and increments his
> counter. Each node will get two numbers, one of the right side and one
> for the left. Computer Science majors will recognize this as a
> modified preorder tree traversal algorithm. Finally, drop the unneeded
> OrgChart.boss_emp_name column which used to represent the edges of a
> graph.
>
> � This has some predictable results that we can use for building
> queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
> FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
> are defined by the BETWEEN predicate; etc. Here are two common queries
> which can be used to build others:
>
> �1. An employee and all their Supervisors, no matter how deep the
> tree.
>
> � SELECT O2.*
> � FROM OrgChart AS O1, OrgChart AS O2
> � WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> � AND O1.emp_name = :myemployee;
>
> � �2. The employee and all their subordinates. There is a nice
> symmetry here.
>
> � SELECT O1.*
> � FROM OrgChart AS O1, OrgChart AS O2
> � WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> � AND O2.emp_name = :myemployee;
>
> � 3. Add a GROUP BY and aggregate functions to these basic queries and
> you have hierarchical reports. For example, the total salaries which
> each employee controls:
>
> � �SELECT O2.emp_name, SUM(S1.salary_amt)
> � FROM OrgChart AS O1, OrgChart AS O2,
> � Salaries AS S1
> � WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> � AND O1.emp_name = S1.emp_name
> � GROUP BY O2.emp_name;
>
> � �4. To find the level of each emp_name, so you can print the tree as
> an indented listing.
>
> � SELECT T1.node,
> � SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
> � + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
> � FROM Tree AS T1, Tree AS T2
> � WHERE T2.lft <= T1.lft
> � GROUP BY T1.node;
>
> � An untested version of this using OLAP functions might be better
> able to use the ordering.
>
> � SELECT T1.node,
> � SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
> � + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
> � OVER (ORDER BY T1.lft
> � RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
> � FROM Tree AS T1, Tree AS T2
> � WHERE T2.lft <= T1.lft;
>
> � 5. The nested set model has an implied ordering of siblings which
> the adjacency list model does not. To insert a new node, G1, under
> part G. We can insert one node at a time like this:
>
> � �BEGIN ATOMIC
> � DECLARE rightmost_spread INTEGER;
> � SET rightmost_spread
> � = (SELECT rgt
> � FROM Frammis
> � WHERE part = 'G');
> � UPDATE Frammis
> � SET lft = CASE WHEN lft > rightmost_spread
> � THEN lft + 2
> � ELSE lft END,
> � rgt = CASE WHEN rgt >= rightmost_spread
> � THEN rgt + 2
> � ELSE rgt END
> � WHERE rgt >= rightmost_spread;
>
> � INSERT INTO Frammis (part, lft, rgt)
> � VALUES ('G1', rightmost_spread, (rightmost_spread + 1));
>
> � COMMIT WORK;
> � END;
>
> � The idea is to spread the (lft, rgt) numbers after the youngest
> child of the parent, G in this case, over by two to make room for the
> new addition, G1. This procedure will add the new node to the
> rightmost child position, which helps to preserve the idea of an age
> order among the siblings.
>
> � 6. To convert a nested sets model into an adjacency list model:
>
> � SELECT B.emp_name AS boss_emp_name, E.emp_name
> � FROM OrgChart AS E
> � LEFT OUTER JOIN
> � OrgChart AS B
> � ON B.lft
> � = (SELECT MAX(lft)
> � FROM OrgChart AS S
> � WHERE E.lft > S.lft
> � AND E.lft < S.rgt);
>
> � 7. To find the immediate parent of a node:
>
> � SELECT MAX(P2.lft), MIN(P2.rgt)
> � FROM Personnel AS P1, Personnel AS P2
> � WHERE P1.lft BETWEEN P2.lft AND P2.rgt
> � AND P1.emp_name = :my_emp_name;
>
> � I have a book on TREES & HIERARCHIES IN SQL which you can get at
> Amazon.com right now.

Thanks for the reply, Joe.

Unless I am missing something, you have simply reproduced the 'nested set' theory that I have read in a number of other places on the web. You do not seem to have addressed the particular issue that I raised in my original post.

While on the subject, here is a question that has always puzzled me. In your example of inserting a new node, you have the following -

  rgt = CASE WHEN rgt >= rightmost_spread     THEN rgt + 2
    ELSE rgt END
  WHERE rgt >= rightmost_spread;

As the WHERE clause restricts the selection, why do you need the CASE statement at all. In my experiments I have changed this to

rgt = rgt + 2

and it seems to work ok. Is there a problem with this?

Thanks

Frank Received on Mon May 16 2011 - 08:50:52 CEST

This message: [ Message body ]
Next message: Erwin: "Re: How to 'normalise' this scenario"
Previous message: Frank Millman: "Re: How to 'normalise' this scenario"
Maybe in reply to: Frank Millman: "How to 'normalise' this scenario"
In reply to -CELKO-: "Re: How to 'normalise' this scenario"
Next in thread: Erwin: "Re: How to 'normalise' this scenario"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message