Re: pointers on representing tree in db?

From: David Cressey <david_at_dcressey.com>
Date: Wed, 18 Apr 2001 13:41:48 GMT
Message-ID: <wQgD6.37556$2X4.170311_at_petpeeve.ziplink.net>


Philip,

I've seen a tiny addition to Joe Celko's "nested set" method. This involves adding a "level number" column to the hierarchy table. The level number is 1 for the root, 2 for the direct children of the root, and so on. So the direct children of level N nodes would all have N+1 in the level column.

In this model, the hierarchy has exactly four columns:

Seq_no,
Last_Seq_no,
Level_no,
Occupant

Seq_no is the primary key, assigned by traversing the tree, somehow.

Last_Seq_No is an upper limit on the subtree of which this node is the root, an plays the same role as one of the columns in the Celko method. Level_no is explained above.
Occupant is a foreign key that identifies the current person or thing assigned to this slot in the hierarchy.

If it's a hierarchy of roles in the enterprise, for example, Occupant could be the Employee Id of the person holding the slot.

--
Regards,
    David Cressey
    www.dcressey.com
"Philip Lijnzaad" <lijnzaad_at_ebi.ac.uk> wrote in message
news:u7hezmficu.fsf_at_sol6.ebi.ac.uk...

>
> Lennart> Quite neet feature. Oracle seems to have a lot of things that DB2
doesnt. I
> Lennart> am however more interested in "classical" work using only
standard SQL. The
> Lennart> solution I came up with is to use two tables
>
> Lennart> Table Data (id integer not null, parent_id references...) id is
p.k
>
> what is Data.parent_id for ?
>
> Lennart> Table Ancestor (id, ancestor_id) both id and ancestor_id
references id in
> Lennart> Data
>
> Lennart> Both a subtree and a path for a particular node can be easily
> Lennart> retrieved from the Ancestor table.
>
> no they can't; it needs support in the form of SQL extensions such as
> Oracle's CONNECT BY PRIOR.
>
> A neat solution that is frequently posted by Joe Celko (and also described
in
> detail in his excellent "SQL for Smarties"), is the 'nested set'
> solution. The advantage is that it represents trees really as sets of
nodes,
> whereas the (id,ancestor_id) approach ('adjacency list') really only gives
> one node or one set of direct children of a node. It is completely
standard
> SQL, and is exceedingly fast.
>
> The disadvantage is that in Celko's approach, it's not easy to select the
> direct parent or direct children of a node.
>
> In practice, I think it would make sense to mix the two approaches.
>
> I am reposting it here. Cheers,
>
>
Philip
>
>
> From: joe_celko_at_my-deja.com
> Subject: Re: tree structure in database
> Date: Fri, 13 Aug 1999 01:00:59 GMT
>
>
>
> Another way of representing trees is to show them as nested sets.
> Since SQL is a set oriented language, this is a better model than the
> usual adjacency list approach you see in most text books. Let us
> define a simple Personnel table like this, ignoring the left (lft) and
> right (rgt) columns for now. This problem is always given with a
> column for the employee and one for his boss in the textbooks:
>
> CREATE TABLE Personnel
> (emp CHAR(10) PRIMARY KEY,
> boss CHAR(10), -- unneeded & denormalizes the table
> salary DECIMAL(6,2) NOT NULL,
> lft INTEGER NOT NULL,
> rgt INTEGER NOT NULL);
>
> Personnel
> emp boss salary lft rgt
> ===================================
> Albert NULL 1000.00 1 12
> Bert Albert 900.00 2 3
> Chuck Albert 900.00 4 11
> Donna Chuck 800.00 5 6
> Eddie Chuck 700.00 7 8
> Fred Chuck 600.00 9 10
>
> which would look like this as a directed graph:
>
> Albert (1,12)
> / \
> / \
> Bert (2,3) Chuck (4,11)
> / | \
> / | \
> / | \
> / | \
> Donna (5,6) Eddie (7,8) Fred (9,10)
>
> This (without the lft and rgt columns) is called the adjacency list
> model, after the graph theory technique of the same name; the pairs of
> nodes are adjacent to each other. The problem with the adjacency list
> model is that the boss and employee columns are the same kind of thing
> (i.e. names of personnel), and therefore should be shown in only one
> column in a normalized table. To prove that this is not normalized,
> assume that "Chuck" changes his name to "Charles"; you have to change
> his name in both columns and several places. The defining
> characteristic of a normalized table is that you have one fact, one
> place, one time.
>
> To show a tree as nested sets, replace the nodes with ovals, then nest
> subordinate ovals inside each other. The root will be the largest oval
> and will contain every other node. The leaf nodes will be the
> innermost ovals with nothing else inside them and the nesting will show
> the hierarchical relationship. The rgt and lft columns (I cannot use
> the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
>
> If that mental model does not work, then imagine a little worm crawling
> anti-clockwise along the tree. Every time he gets to the left or right
> side of a node, he numbers it. The worm stops when he gets all the way
> around the tree and back to the top.
>
> This is a natural way to model a parts explosion, since a final
> assembly is made of physically nested assemblies that final break down
> into separate parts.
>
> At this point, the boss column is both redundant and denormalized, so
> it can be dropped. Also, note that the tree structure can be kept in
> one table and all the information about a node can be put in a second
> table and they can be joined on employee number for queries.
>
> To convert the graph into a nested sets model think of a little worm
> crawling along the tree. The worm starts at the top, the root, makes a
> complete trip around the tree. When he comes to a node, he puts a
> number in the cell on the side that he is visiting and increments his
> counter. Each node will get two numbers, one of the right side and one
> for the left. Computer Science majors will recognize this as a odified
> preorder tree traversal algorithm. Finally, drop the unneeded
> ersonnel.boss column which used to represent the edges of a graph.
>
> This has some predictable results that we can use for building
> queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
> FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
> are defined by the BETWEEN predicate; etc. Here are two common queries
> which can be used to build others:
>
> 1. An employee and all their Supervisors, no matter how deep the tree.
>
> SELECT P2.*
> FROM Personnel AS P1, Personnel AS P2
> WHERE P1.lft BETWEEN P2.lft AND P2.rgt
> AND P1.emp = :myemployee;
>
> 2. The employee and all subordinates. There is a nice symmetry here.
>
> SELECT P2.*
> FROM Personnel AS P1, Personnel AS P2
> WHERE P1.lft BETWEEN P2.lft AND P2.rgt
> AND P2.emp = :myemployee;
>
> 3. Add a GROUP BY and aggregate functions to these basic queries and
> you have hierarchical reports. For example, the total salaries which
> each employee controls:
>
> SELECT P2.emp, SUM(P1.salary)
> FROM Personnel AS P1, Personnel AS P2
> WHERE P1.lft BETWEEN P2.lft AND P2.rgt
> GROUP BY P2.emp;
>
> This will be two to three orders of magnitude faster than the adjacency
> list model.
>
> For details, see the chapter in my book JOE CELKO'S SQL FOR SMARTIES by
> Joe Celko, Morgan-Kaufmann, 1995, ISBN 1-55860-323-9 or my columns in
> DBMS magazine in 1996 March, April, May and June issues. I went into
> details as to how to manipulate this model.
>
> --CELKO--
>
> --
> Why don't Alice and Bob simply get married?
> --------------------------------------------------------------------------
---
> Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm
A2-08
> +44 (0)1223 49 4639 / Wellcome Trust Genome Campus,
Hinxton
> +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT
BRITAIN
> PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53
Received on Wed Apr 18 2001 - 15:41:48 CEST

Original text of this message