Re: How to 'normalise' this scenario
Date: Sun, 15 May 2011 23:50:52 -0700 (PDT)
Message-ID: <67c14d64-9441-44e0-8f97-7021bf01336f_at_z13g2000yqg.googlegroups.com>
On May 15, 3:02 pm, -CELKO- <jcelko..._at_earthlink.net> wrote:
> Since SQL is a set oriented language, this is a better model than the
> usual adjacency list approach you see in most text books. Let us
> define a simple OrgChart table like this.
>
> CREATE TABLE OrgChart
> (emp_name CHAR(10) NOT NULL PRIMARY KEY,
> lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
> rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
> CONSTRAINT order_okay CHECK (lft < rgt));
>
> OrgChart
> emp_name lft rgt
> ======================
> 'Albert' 1 12
> 'Bert' 2 3
> 'Chuck' 4 11
> 'Donna' 5 6
> 'Eddie' 7 8
> 'Fred' 9 10
>
> The (lft, rgt) pairs are like tags in a mark-up language, or parens
> in algebra, BEGIN-END blocks in Algol-family programming languages,
> etc. -- they bracket a sub-set. This is a set-oriented approach to
> trees in a set-oriented language. Technically, there ought to be a
> separate personnel and organization table, but bear with me,
>
> The organizational chart would look like this as a directed graph:
>
> Albert (1, 12)
> / \
> / \
> Bert (2, 3) Chuck (4, 11)
> / \
> / | \
> / | \
> / | \
>
> Donna (5, 6) Eddie (7, 8) Fred (9, 10)
>
> The adjacency list table is denormalized in several ways. We are
> modeling both the Personnel and the Organizational chart in one table.
> But for the sake of saving space, pretend that the names are job
> titles and that we have another table which describes the Personnel
> that hold those positions.
>
> Another problem with the adjacency list model is that the
> boss_emp_name and employee columns are the same kind of thing (i.e.
> identifiers of personnel), and therefore should be shown in only one
> column in a normalized table. To prove that this is not normalized,
> assume that "Chuck" changes his name to "Charles"; you have to change
> his name in both columns and several places. The defining
> characteristic of a normalized table is that you have one fact, one
> place, one time.
>
> The final problem is that the adjacency list model does not model
> subordination. Authority flows downhill in a hierarchy, but If I fire
> Chuck, I disconnect all of his subordinates from Albert. There are
> situations (i.e. water pipes) where this is true, but that is not the
> expected situation in this case.
>
> To show a tree as nested sets, replace the nodes with ovals, and
> then nest subordinate ovals inside each other. The root will be the
> largest oval and will contain every other node. The leaf nodes will be
> the innermost ovals with nothing else inside them and the nesting will
> show the hierarchical relationship. The (lft, rgt) columns (I cannot
> use the reserved words LEFT and RIGHT in SQL) are what show the
> nesting. This is like XML, HTML or parentheses.
>
> At this point, the boss_emp_name column is both redundant and
> denormalized, so it can be dropped. Also, note that the tree structure
> can be kept in one table and all the information about a node can be
> put in a second table and they can be joined on employee number for
> queries.
>
> To convert the graph into a nested sets model think of a little worm
> crawling along the tree. The worm starts at the top, the root, makes a
> complete trip around the tree. When he comes to a node, he puts a
> number in the cell on the side that he is visiting and increments his
> counter. Each node will get two numbers, one of the right side and one
> for the left. Computer Science majors will recognize this as a
> modified preorder tree traversal algorithm. Finally, drop the unneeded
> OrgChart.boss_emp_name column which used to represent the edges of a
> graph.
>
> This has some predictable results that we can use for building
> queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
> FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
> are defined by the BETWEEN predicate; etc. Here are two common queries
> which can be used to build others:
>
> 1. An employee and all their Supervisors, no matter how deep the
> tree.
>
> SELECT O2.*
> FROM OrgChart AS O1, OrgChart AS O2
> WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> AND O1.emp_name = :myemployee;
>
> 2. The employee and all their subordinates. There is a nice
> symmetry here.
>
> SELECT O1.*
> FROM OrgChart AS O1, OrgChart AS O2
> WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> AND O2.emp_name = :myemployee;
>
> 3. Add a GROUP BY and aggregate functions to these basic queries and
> you have hierarchical reports. For example, the total salaries which
> each employee controls:
>
> SELECT O2.emp_name, SUM(S1.salary_amt)
> FROM OrgChart AS O1, OrgChart AS O2,
> Salaries AS S1
> WHERE O1.lft BETWEEN O2.lft AND O2.rgt
> AND O1.emp_name = S1.emp_name
> GROUP BY O2.emp_name;
>
> 4. To find the level of each emp_name, so you can print the tree as
> an indented listing.
>
> SELECT T1.node,
> SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
> + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
> FROM Tree AS T1, Tree AS T2
> WHERE T2.lft <= T1.lft
> GROUP BY T1.node;
>
> An untested version of this using OLAP functions might be better
> able to use the ordering.
>
> SELECT T1.node,
> SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
> + CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
> OVER (ORDER BY T1.lft
> RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS lvl
> FROM Tree AS T1, Tree AS T2
> WHERE T2.lft <= T1.lft;
>
> 5. The nested set model has an implied ordering of siblings which
> the adjacency list model does not. To insert a new node, G1, under
> part G. We can insert one node at a time like this:
>
> BEGIN ATOMIC
> DECLARE rightmost_spread INTEGER;
> SET rightmost_spread
> = (SELECT rgt
> FROM Frammis
> WHERE part = 'G');
> UPDATE Frammis
> SET lft = CASE WHEN lft > rightmost_spread
> THEN lft + 2
> ELSE lft END,
> rgt = CASE WHEN rgt >= rightmost_spread
> THEN rgt + 2
> ELSE rgt END
> WHERE rgt >= rightmost_spread;
>
> INSERT INTO Frammis (part, lft, rgt)
> VALUES ('G1', rightmost_spread, (rightmost_spread + 1));
>
> COMMIT WORK;
> END;
>
> The idea is to spread the (lft, rgt) numbers after the youngest
> child of the parent, G in this case, over by two to make room for the
> new addition, G1. This procedure will add the new node to the
> rightmost child position, which helps to preserve the idea of an age
> order among the siblings.
>
> 6. To convert a nested sets model into an adjacency list model:
>
> SELECT B.emp_name AS boss_emp_name, E.emp_name
> FROM OrgChart AS E
> LEFT OUTER JOIN
> OrgChart AS B
> ON B.lft
> = (SELECT MAX(lft)
> FROM OrgChart AS S
> WHERE E.lft > S.lft
> AND E.lft < S.rgt);
>
> 7. To find the immediate parent of a node:
>
> SELECT MAX(P2.lft), MIN(P2.rgt)
> FROM Personnel AS P1, Personnel AS P2
> WHERE P1.lft BETWEEN P2.lft AND P2.rgt
> AND P1.emp_name = :my_emp_name;
>
> I have a book on TREES & HIERARCHIES IN SQL which you can get at
> Amazon.com right now.
Unless I am missing something, you have simply reproduced the 'nested set' theory that I have read in a number of other places on the web. You do not seem to have addressed the particular issue that I raised in my original post.
While on the subject, here is a question that has always puzzled me. In your example of inserting a new node, you have the following -
rgt = CASE WHEN rgt >= rightmost_spread
THEN rgt + 2
ELSE rgt END
WHERE rgt >= rightmost_spread;
As the WHERE clause restricts the selection, why do you need the CASE statement at all. In my experiments I have changed this to
rgt = rgt + 2
and it seems to work ok. Is there a problem with this?
Thanks
Frank Received on Mon May 16 2011 - 08:50:52 CEST