Re: pointers on representing tree in db?

From: David Cressey <david_at_dcressey.com>
Date: Wed, 18 Apr 2001 13:41:48 GMT
Message-ID: <wQgD6.37556$2X4.170311_at_petpeeve.ziplink.net>

Philip,

I've seen a tiny addition to Joe Celko's "nested set" method. This involves adding a "level number" column to the hierarchy table. The level number is 1 for the root, 2 for the direct children of the root, and so on. So the direct children of level N nodes would all have N+1 in the level column.

In this model, the hierarchy has exactly four columns:

Seq_no,
Last_Seq_no,
Level_no,
Occupant

Seq_no is the primary key, assigned by traversing the tree, somehow.

Last_Seq_No is an upper limit on the subtree of which this node is the root, an plays the same role as one of the columns in the Celko method. Level_no is explained above.
Occupant is a foreign key that identifies the current person or thing assigned to this slot in the hierarchy.

If it's a hierarchy of roles in the enterprise, for example, Occupant could be the Employee Id of the person holding the slot.

--
Regards,
    David Cressey
    www.dcressey.com
"Philip Lijnzaad" <lijnzaad_at_ebi.ac.uk> wrote in message
news:u7hezmficu.fsf_at_sol6.ebi.ac.uk...


>

> Lennart> Quite neet feature. Oracle seems to have a lot of things that DB2


 doesnt. I


> Lennart> am however more interested in "classical" work using only


 standard SQL. The


> Lennart> solution I came up with is to use two tables


>

> Lennart> Table Data (id integer not null, parent_id references...) id is

 p.k


>

> what is Data.parent_id for ?


>

> Lennart> Table Ancestor (id, ancestor_id) both id and ancestor_id

 references id in


> Lennart> Data

>

> Lennart> Both a subtree and a path for a particular node can be easily


> Lennart> retrieved from the Ancestor table.

>

> no they can't; it needs support in the form of SQL extensions such as

> Oracle's CONNECT BY PRIOR.

>

> A neat solution that is frequently posted by Joe Celko (and also described

 in


> detail in his excellent "SQL for Smarties"), is the 'nested set'


> solution. The advantage is that it represents trees really as sets of

 nodes,


> whereas the (id,ancestor_id) approach ('adjacency list') really only gives


> one node or one set of direct children of a node.  It is completely

 standard


> SQL, and is exceedingly fast.


>

> The disadvantage is that in Celko's approach, it's not easy to select the

> direct parent or direct children of a  node.

>

> In practice, I think it would make sense to mix the two approaches.

>

> I am reposting it here. Cheers,

>

>

 Philip


>

>

>   From: joe_celko_at_my-deja.com

>   Subject: Re: tree structure in database


>   Date: Fri, 13 Aug 1999 01:00:59 GMT

>

>

>

>   Another way of representing trees is to show them as nested sets.

>   Since SQL is a set oriented language, this is a better model than the

>   usual adjacency list approach you see in most text books.  Let us

>   define a simple Personnel table like this, ignoring the left (lft) and

>   right (rgt) columns for now.  This problem is always given with a

>   column for the employee and one for his boss in the textbooks:

>

>    CREATE TABLE Personnel

>    (emp CHAR(10) PRIMARY KEY,

>     boss CHAR(10),       -- unneeded & denormalizes the table

>     salary DECIMAL(6,2) NOT NULL,

>     lft INTEGER NOT NULL,

>     rgt INTEGER NOT NULL);

>

>    Personnel

>    emp      boss      salary  lft  rgt

>    ===================================

>    Albert   NULL     1000.00   1   12

>    Bert     Albert    900.00   2    3

>    Chuck    Albert    900.00   4   11

>    Donna    Chuck     800.00   5    6

>    Eddie    Chuck     700.00   7    8

>    Fred     Chuck     600.00   9   10

>

>    which would look like this as a directed graph:

>

>               Albert (1,12)

>               /        \

>             /            \

>       Bert (2,3)    Chuck (4,11)

>                      /    |   \

>                    /      |     \

>                  /        |       \

>                /          |         \

>           Donna (5,6)  Eddie (7,8)  Fred (9,10)

>

>   This (without the lft and rgt columns) is called the adjacency list

>   model, after the graph theory technique of the same name; the pairs of

>   nodes are adjacent to each other.  The problem with the adjacency list

>   model is that the boss and employee columns are the same kind of thing

>   (i.e. names of personnel), and therefore should be shown in only one

>   column in a normalized table.  To prove that this is not normalized,

>   assume that "Chuck" changes his name to "Charles"; you have to change

>   his name in both columns and several places.  The defining

>   characteristic of a normalized table is that you have one fact, one

>   place, one time.

>

>   To show a tree as nested sets, replace the nodes with ovals, then nest

>   subordinate ovals inside each other.  The root will be the largest oval

>   and will contain every other node.  The leaf nodes will be the

>   innermost ovals with nothing else inside them and the nesting will show

>   the hierarchical relationship.  The rgt and lft columns (I cannot use

>   the reserved words LEFT and RIGHT in SQL) are what shows the nesting.

>

>   If that mental model does not work, then imagine a little worm crawling

>   anti-clockwise along the tree.  Every time he gets to the left or right

>   side of a node, he numbers it.  The worm stops when he gets all the way

>   around the tree and back to the top.

>

>   This is a natural way to model a parts explosion, since a final

>   assembly is made of physically nested assemblies that final break down

>   into separate parts.

>

>   At this point, the boss column is both redundant and denormalized, so

>   it can be dropped.  Also, note that the tree structure can be kept in

>   one table and all the information about a node can be put in a second

>   table and they can be joined on employee number for queries.

>

>   To convert the graph into a nested sets model think of a little worm

>   crawling along the tree.  The worm starts at the top, the root, makes a

>   complete trip around the tree.  When he comes to a node, he puts a

>   number in the cell on the side that he is visiting and increments his

>   counter.  Each node will get two numbers, one of the right side and one

>   for the left.  Computer Science majors will recognize this as a odified

>   preorder tree traversal algorithm.  Finally, drop the unneeded

>   ersonnel.boss column which used to represent the edges of a graph.

>

>   This has some predictable results that we can use for building

>   queries.  The root is always (left = 1, right = 2 * (SELECT COUNT(*)

>   FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees

>   are defined by the BETWEEN predicate; etc.  Here are two common queries

>   which can be used to build others:

>

>   1. An employee and all their Supervisors, no matter how deep the tree.

>

>    SELECT P2.*

>      FROM Personnel AS P1, Personnel AS P2

>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt

>       AND P1.emp = :myemployee;

>

>   2. The employee and all subordinates. There is a nice symmetry here.

>

>    SELECT P2.*

>      FROM Personnel AS P1, Personnel AS P2

>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt

>       AND P2.emp = :myemployee;

>

>   3. Add a GROUP BY and aggregate functions to these basic queries and

>   you have hierarchical reports.  For example, the total salaries which

>   each employee controls:

>

>    SELECT P2.emp, SUM(P1.salary)

>      FROM Personnel AS P1, Personnel AS P2

>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt

>     GROUP BY P2.emp;

>

>   This will be two to three orders of magnitude faster than the adjacency

>   list model.

>

>   For details, see the chapter in my book JOE CELKO'S SQL FOR SMARTIES by

>   Joe Celko, Morgan-Kaufmann, 1995, ISBN 1-55860-323-9 or my columns in

>   DBMS magazine in 1996 March, April, May and June issues.  I went into

>   details as to how to manipulate this model.

>

>   --CELKO--

>

> --

> Why don't Alice and Bob simply get married?

> --------------------------------------------------------------------------

 ---


> Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm


 A2-08


> +44 (0)1223 49 4639                 / Wellcome Trust Genome Campus,


 Hinxton


> +44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT


 BRITAIN


> PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53

Received on Wed Apr 18 2001 - 15:41:48 CEST

This message: [ Message body ]
Next message: Shayne G. Wright: "Re: Functional Dependencies"
Maybe in reply to: Lennart Jonsson: "pointers on representing tree in db?"
In reply to Philip Lijnzaad: "Re: pointers on representing tree in db?"
Next in thread: Carlos Bromanski: "Re: GROUP BY question"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message