From: "David Cressey" <david@dcressey.com>
Newsgroups: comp.databases.theory
References: <3ad9cbbf$1@news.kommunicera.umea.se> <9bfr66$mha$2@geraldo.cc.utexas.edu> <3adc922a$1@news.kommunicera.umea.se> <u7hezmficu.fsf@sol6.ebi.ac.uk>
Subject: Re: pointers on representing tree in db?
Lines: 209
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.3018.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300
Message-ID: <wQgD6.37556$2X4.170311@petpeeve.ziplink.net>
Date: Wed, 18 Apr 2001 13:41:48 GMT
NNTP-Posting-Host: 206.15.152.141
X-Trace: petpeeve.ziplink.net 987601308 206.15.152.141 (Wed, 18 Apr 2001 09:41:48 EDT)
NNTP-Posting-Date: Wed, 18 Apr 2001 09:41:48 EDT


Philip,

I've seen a tiny addition to Joe Celko's "nested set" method.  This involves
adding a "level number" column to the hierarchy table.  The level number is
1 for the root, 2 for the direct children of the root, and so on.  So the
direct children of level N nodes would all have N+1 in the level column.

In this model,  the hierarchy has exactly four columns:

Seq_no,
Last_Seq_no,
Level_no,
Occupant

Seq_no is the primary key,  assigned by traversing the tree, somehow.

Last_Seq_No is an upper limit on the subtree of which this node is the root,
an plays the same role as one of the columns in the Celko method.  Level_no
is explained above.
Occupant is a foreign key that identifies the current person or thing
assigned to this slot in the hierarchy.

If it's a hierarchy of roles in the enterprise, for example, Occupant could
be the Employee Id of the person holding the slot.

--
Regards,
    David Cressey
    www.dcressey.com
"Philip Lijnzaad" <lijnzaad@ebi.ac.uk> wrote in message
news:u7hezmficu.fsf@sol6.ebi.ac.uk...
>
> Lennart> Quite neet feature. Oracle seems to have a lot of things that DB2
 doesnt. I
> Lennart> am however more interested in "classical" work using only
 standard SQL. The
> Lennart> solution I came up with is to use two tables
>
> Lennart> Table Data (id integer not null, parent_id references...) id is
 p.k
>
> what is Data.parent_id for ?
>
> Lennart> Table Ancestor (id, ancestor_id) both id and ancestor_id
 references id in
> Lennart> Data
>
> Lennart> Both a subtree and a path for a particular node can be easily
> Lennart> retrieved from the Ancestor table.
>
> no they can't; it needs support in the form of SQL extensions such as
> Oracle's CONNECT BY PRIOR.
>
> A neat solution that is frequently posted by Joe Celko (and also described
 in
> detail in his excellent "SQL for Smarties"), is the 'nested set'
> solution. The advantage is that it represents trees really as sets of
 nodes,
> whereas the (id,ancestor_id) approach ('adjacency list') really only gives
> one node or one set of direct children of a node.  It is completely
 standard
> SQL, and is exceedingly fast.
>
> The disadvantage is that in Celko's approach, it's not easy to select the
> direct parent or direct children of a  node.
>
> In practice, I think it would make sense to mix the two approaches.
>
> I am reposting it here. Cheers,
>
>
 Philip
>
>
>   From: joe_celko@my-deja.com
>   Subject: Re: tree structure in database
>   Date: Fri, 13 Aug 1999 01:00:59 GMT
>
>
>
>   Another way of representing trees is to show them as nested sets.
>   Since SQL is a set oriented language, this is a better model than the
>   usual adjacency list approach you see in most text books.  Let us
>   define a simple Personnel table like this, ignoring the left (lft) and
>   right (rgt) columns for now.  This problem is always given with a
>   column for the employee and one for his boss in the textbooks:
>
>    CREATE TABLE Personnel
>    (emp CHAR(10) PRIMARY KEY,
>     boss CHAR(10),       -- unneeded & denormalizes the table
>     salary DECIMAL(6,2) NOT NULL,
>     lft INTEGER NOT NULL,
>     rgt INTEGER NOT NULL);
>
>    Personnel
>    emp      boss      salary  lft  rgt
>    ===================================
>    Albert   NULL     1000.00   1   12
>    Bert     Albert    900.00   2    3
>    Chuck    Albert    900.00   4   11
>    Donna    Chuck     800.00   5    6
>    Eddie    Chuck     700.00   7    8
>    Fred     Chuck     600.00   9   10
>
>    which would look like this as a directed graph:
>
>               Albert (1,12)
>               /        \
>             /            \
>       Bert (2,3)    Chuck (4,11)
>                      /    |   \
>                    /      |     \
>                  /        |       \
>                /          |         \
>           Donna (5,6)  Eddie (7,8)  Fred (9,10)
>
>   This (without the lft and rgt columns) is called the adjacency list
>   model, after the graph theory technique of the same name; the pairs of
>   nodes are adjacent to each other.  The problem with the adjacency list
>   model is that the boss and employee columns are the same kind of thing
>   (i.e. names of personnel), and therefore should be shown in only one
>   column in a normalized table.  To prove that this is not normalized,
>   assume that "Chuck" changes his name to "Charles"; you have to change
>   his name in both columns and several places.  The defining
>   characteristic of a normalized table is that you have one fact, one
>   place, one time.
>
>   To show a tree as nested sets, replace the nodes with ovals, then nest
>   subordinate ovals inside each other.  The root will be the largest oval
>   and will contain every other node.  The leaf nodes will be the
>   innermost ovals with nothing else inside them and the nesting will show
>   the hierarchical relationship.  The rgt and lft columns (I cannot use
>   the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
>
>   If that mental model does not work, then imagine a little worm crawling
>   anti-clockwise along the tree.  Every time he gets to the left or right
>   side of a node, he numbers it.  The worm stops when he gets all the way
>   around the tree and back to the top.
>
>   This is a natural way to model a parts explosion, since a final
>   assembly is made of physically nested assemblies that final break down
>   into separate parts.
>
>   At this point, the boss column is both redundant and denormalized, so
>   it can be dropped.  Also, note that the tree structure can be kept in
>   one table and all the information about a node can be put in a second
>   table and they can be joined on employee number for queries.
>
>   To convert the graph into a nested sets model think of a little worm
>   crawling along the tree.  The worm starts at the top, the root, makes a
>   complete trip around the tree.  When he comes to a node, he puts a
>   number in the cell on the side that he is visiting and increments his
>   counter.  Each node will get two numbers, one of the right side and one
>   for the left.  Computer Science majors will recognize this as a odified
>   preorder tree traversal algorithm.  Finally, drop the unneeded
>   ersonnel.boss column which used to represent the edges of a graph.
>
>   This has some predictable results that we can use for building
>   queries.  The root is always (left = 1, right = 2 * (SELECT COUNT(*)
>   FROM TreeTable)); leaf nodes always have (left + 1 = right); subtrees
>   are defined by the BETWEEN predicate; etc.  Here are two common queries
>   which can be used to build others:
>
>   1. An employee and all their Supervisors, no matter how deep the tree.
>
>    SELECT P2.*
>      FROM Personnel AS P1, Personnel AS P2
>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt
>       AND P1.emp = :myemployee;
>
>   2. The employee and all subordinates. There is a nice symmetry here.
>
>    SELECT P2.*
>      FROM Personnel AS P1, Personnel AS P2
>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt
>       AND P2.emp = :myemployee;
>
>   3. Add a GROUP BY and aggregate functions to these basic queries and
>   you have hierarchical reports.  For example, the total salaries which
>   each employee controls:
>
>    SELECT P2.emp, SUM(P1.salary)
>      FROM Personnel AS P1, Personnel AS P2
>     WHERE P1.lft BETWEEN P2.lft AND P2.rgt
>     GROUP BY P2.emp;
>
>   This will be two to three orders of magnitude faster than the adjacency
>   list model.
>
>   For details, see the chapter in my book JOE CELKO'S SQL FOR SMARTIES by
>   Joe Celko, Morgan-Kaufmann, 1995, ISBN 1-55860-323-9 or my columns in
>   DBMS magazine in 1996 March, April, May and June issues.  I went into
>   details as to how to manipulate this model.
>
>   --CELKO--
>
> --
> Why don't Alice and Bob simply get married?
> --------------------------------------------------------------------------
 ---
> Philip Lijnzaad, lijnzaad@ebi.ac.uk \ European Bioinformatics Institute,rm
 A2-08
> +44 (0)1223 49 4639                 / Wellcome Trust Genome Campus,
 Hinxton
> +44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT
 BRITAIN
> PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53



