Re: Other tree representations in SQL

From: Heinz Huber <hhuber_at_racon-linz.at>
Date: Wed, 04 Dec 2002 11:09:32 +0100
Message-ID: <3dedd45d$0$26088$91cee783_at_newsreader01.highway.telekom.at>


--CELKO-- wrote:
> Vadim Tropashko suggested that we could represent a tree structure in
> SQL by storing a traversal sequence number and the level of each node
> to represent a hierarchy. This would be an alternative to my nested
> sets model
>
> Tree Traversals are usually associated with binary trees in the
> literature because the three types of traversals can be written in
> very neat, clean recursive procedures. But they can be generalized to
> any tree by moving from the leftmost to the rightmost subtree. There
> are three ways to traverse a tree -- preorder, inorder and postorder.
>
> A
> / \
> B C
> / \ / \
> D E F G
>
[snip]
>
> Here is a table based of a non-binary tree, numbered with the level
> and the preorder traversal sequence number of each node.
>
> INSERT INTO Tree VALUES ('a', 1, 1);
> INSERT INTO Tree VALUES ('b', 2, 2);
> INSERT INTO Tree VALUES ('c', 3, 2);
> INSERT INTO Tree VALUES ('d', 4, 3);
> INSERT INTO Tree VALUES ('e', 5, 3);
> INSERT INTO Tree VALUES ('f', 6, 3);
> INSERT INTO Tree VALUES ('g', 7, 2);
> INSERT INTO Tree VALUES ('h', 8, 3);
> INSERT INTO Tree VALUES ('i', 9, 3);
>
> This does give us a unique co-ordinate pair to identify a node, and I
> played with it a little bit.
>
> 1) Queries which deal with levels in the hierarchy are quite easy,
> but those which deal with aggregations up and down the hierarchy can
> be complex. To establish the parent of a given child, you have to
> find the greatest sequence number that exists one level higher than
> the child node. The parent's sequence number will always be less than
> the child's sequence number.
>
> SELECT *, :my_child
> FROM Tree
> WHERE seq
> = (SELECT MAX(T1.seq) AS parent
> FROM Tree AS T1,Tree AS T2
> WHERE T1.seq < T2.seq
> AND T1.level = T2.level -1
> AND T1.node = :my_child);

Little typo here: It should be T2.node = :my_child.

>
> 2) Finding a subtree rooted at a particular node is more complex.
> The leftmost child of a node is easy to find; it is the sequence
> number of the parent plus one. The problem is establishing a
> rightmost child in a subtree.
>
> You know that all the children will have a greater sequence number
> than the root and a greater level number. But at the same time, you
> do not want to get the nodes in other subtrees that are siblings to
> the right of the root node.
>
> The way you know that a sequence number belongs to another subtree is
> that while the sequence number increased, the level number decreased.
> For example, if we want to see the subtree rooted at ('c', 3, 2) in
> the sample table, we know that the subtree must have greater sequence
> number, thus:
>
> ('d', 4, 3)
> ('e', 5, 3)
> ('f', 6, 3)
> ('g', 7, 2) <==
> ('h', 8, 3)
> ('i', 9, 3)
>
> Since 'c' is at level 2, we only need to look at nodes with a greater
> level number, thus
>
> ('d', 4, 3)
> ('e', 5, 3)
> ('f', 6, 3)
> ('h', 8, 3)
> ('i', 9, 3)
>
> But this leaves 'h' and 'i', which are in another subtree under 'g'
> and clearly wrong. What we want to find is the rightmost sequence
> number in the target subtree. This number also happens to be the
> first sequence number that jumps back up to the same or a higher level
> as the root. Look at how 'g' is back on level 2.
>
> But there is another problem. When you look at the rightmost subtree,
> there is no node to the right, so we have to use the MAX(seq) in the
> whole tree. This leads us to this query.
>
> SELECT :my_root, T1.*
> FROM Tree AS T1, Tree AS T2
> WHERE T2.node = :my_root
> AND T2.seq < T1.seq
> AND T1.level > T2.level
> AND T1.seq
> <= COALESCE((SELECT MIN(T3.seq)
> FROM Tree AS T3
> WHERE T3.level = T2.level
> AND T3.seq > T2.seq),
> (SELECT MAX(seq) FROM Tree));
>
[snip]

Wouldn't the following query have a better chance on giving a good performance:
SELECT :my_root, T1.*
  FROM Tree AS T1 JOIN Tree AS T2 ON T1.seq > T2.seq

                                 AND T1.level > T2.level
  WHERE T2.node = :my_root
    AND NOT EXISTS (SELECT T3.seq
                     FROM Tree AS T3
                     WHERE T3.seq > T2.seq AND T3.seq < T1.seq
                       AND T3.level <= T2.level);

This way, you don't need neither a coalesce nor an aggregation function and an intelligent optimizer might use a negative index scan(?).

The queries for postorder traversal are simply the queries for preorder turned around:

INSERT INTO Tree VALUES ('b', 1, 2);
INSERT INTO Tree VALUES ('d', 2, 3);
INSERT INTO Tree VALUES ('e', 3, 3);
INSERT INTO Tree VALUES ('f', 4, 3);
INSERT INTO Tree VALUES ('c', 5, 2);
INSERT INTO Tree VALUES ('h', 6, 3);
INSERT INTO Tree VALUES ('i', 7, 3);
INSERT INTO Tree VALUES ('g', 8, 2);
INSERT INTO Tree VALUES ('a', 9, 1);

SELECT *, :my_child
  FROM Tree
  WHERE seq

  • (SELECT MIN(T1.seq) AS parent FROM Tree AS T1,Tree AS T2 WHERE T1.seq > T2.seq AND T1.level = T2.level -1 AND T2.node = :my_child);

SELECT :my_root, T1.*
  FROM Tree AS T1 JOIN Tree AS T2 ON T1.seq < T2.seq

                                 AND T1.level > T2.level
  WHERE T2.node = :my_root
    AND NOT EXISTS (SELECT T3.seq
                     FROM Tree AS T3
                     WHERE T3.seq < T2.seq AND T3.seq > T1.seq
                       AND T3.level <= T2.level);


Inorder traversal is a bit more of a problem. The first question here is, how to define inorder traversal for a non-binary tree. Which branches do you process before the node and which ones afterwards?

One advantage of this model compared to the nested set is that you can get any precessor when you know which level you want (e.g. grandparent = level - 2; root = level 1; direct under root = level 2). And you can also enumerate all descendants of a certain level (e.g. root = grandparent; children = level 1; grandchildren = level 2).

To overcome the problem of finding all children (at which the nested set model is very good), one could introduce some redundancy and add a right_most_child column.

Another alternative would be using the nested set model and a computed column level (SELECT COUNT(lft) FROM NestedSet WHERE lft <= :lft AND rgt  >= :rgt).

Regards,
Heinz Received on Wed Dec 04 2002 - 11:09:32 CET

Original text of this message