Re: pointers on representing tree in db?

From: Jan Hidders <hidders_at_REMOVE.THIS.win.tue.nl>
Date: 15 Apr 2001 19:38:15 GMT
Message-ID: <9bctb7$9hc$1_at_news.tue.nl>


Lennart Jonsson wrote:
>
> Hi, does any one know of any pointers on how to represent trees in a
> relational db? I'm sure there must be standard ways of doing this
> (representing organizations for example). I have implemented a solution
> which works but is far from perfect, thus eager to see other sides of the
> same coin :-)

There is the quite clever 'nested sets' method by Joe Celko. See

  http://www.dbmsmag.com/9603d06.html

You can summarize the essence (*) of it with the following three rules: 1. Every node is given a unique number by numbering them inorder (**). 2. Every node is represented by a record that contains:

  1. the number that identifies the node, (a candadate key)
  2. the highest number of all its descendents including itself, and
  3. the data associated with the node

So suppose you have the following tree:

                    A
            +-------+------+
            B              C
        +---+---+      +---+---+
        D       E      F       G
             +--+--+
             H     I

Numbering it inorder gives you:

                    A(1)
            +-------+------+
            B(2)           C(7)
        +---+---+      +---+---+
        D(3)    E(4)   F(8)    G(9)
             +--+--+
             H(5)  I(6)

That gives you the following table:
(N = node id, M = max. desc. id, D = data)

T: N M D



   1 9 A
   2 6 B
   3 3 D
   4 6 E
   5 5 H
   6 6 I
   7 9 C
   8 8 F
   9 9 G

With this representation it is easy to ask queries such as 'what are all the descendents of E':

SELECT t2.D
FROM T AS t1, T AS t2
WHERE t1.D = 'E' AND t2.N > t1.N AND t2.N <= t1.D;

or 'give me all the leaf nodes':

SELECT *
FROM T
WHERE N=M; The down-side is that you have to maintain this structure when nodes are removed and added. This may mean that you have to renumber almost the entire tree.

(*) My summary here is different from Joe Celko's version because it

    uses a slightly different (IMHO somewhat simpeler) numbering     scheme. But the basic idea is the same and it has the same pro's     and con's.

(**) Numbering inorder means that you start with the root node and then

     recursively do it in the following order:
     - first the node itself
     - then the left tree
     - and finally the right tree.
   
-- 
  Jan Hidders
Received on Sun Apr 15 2001 - 21:38:15 CEST

Original text of this message