Re: Nested sets performance ?

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 7 Feb 2002 09:48:23 -0800
Message-ID: <c0d87ec0.0202070948.753fdd6f_at_posting.google.com>


>> I have been using the nested sets example (thanks --CELKO--) on
relatively small amounts of data. <<

I need to get my boss to print up a white paper on the Nested Sets model, so I can just pass it out ...

>> As I understand, every insertion into the tree involves updating
every
other node to the right [of the insertion point]. <<

Correct. But I recommend that you insert the new subtree as the rightmost sibling of the parent. Here is the code for one node; to extend the idea to a larger subtree by replacing 2 with twice the number of nodes in the new subtree.

BEGIN
DECLARE right_most_sibling INTEGER;

--insertion point
SET right_most_sibling

  • (SELECT rgt FROM Personnel WHERE emp = :your_boss); --make a gap UPDATE Personnel SET lft = CASE WHEN lft > right_most_sibling THEN lft + 2 ELSE lft END, rgt = CASE WHEN rgt >= right_most_sibling THEN rgt + 2 ELSE rgt END WHERE rgt >= right_most_sibling;

--insert new guy
INSERT INTO Personnel (emp, lft, rgt)
VALUES ('New Guy', right_most_sibling, (right_most_sibling + 1)) END; This leads to less than half of the nodes changing, depending on the shape and skew of the original tree. For example, newsgroup message trees tend to grow from the right, not the middle.

>> I fear that a simple row insertion on a record set for large
volumes of data will impair performance. <<

Not as bad as people think, if you did the data model right. One fo the first ten rules is that a table is a set of entities or a relationship, but not both. Ergo, the tree STRUCTURE (node, lft, rgt) is one table and the tree NODES are in a separate table. It is the difference between an organizational chart and personnel data.

This has another advantage; you can put more than one heirarchy on the same set of nodes. I suggested this years ago to a shoe company that was doing a data warehouse project. The manufacturing reports were based on a heirarchy that stressed the physical construction of the shoes. Marketing reports were based on a heirarchy that stressed styles and sales.

For example, back then steel toed work boots were one kind of product to manufacturing, but two kinds of the product to marketing based on shoe size. Steel toed work boots sold to either construciton workers (big feet) or to teenaged girls in punk rock (small feet); there was virtually no market for average sized steel toed work boots.

Now look at the size of a row in the tree structure table -- two integers and a node key; be generous and call it 8 bytes per row. Given a cache with 16 Kbyte data pages, you can get a lot of tree into main storage. Better yet, make the table all key columns and you get a covering index, so the table disappears into the index completely.

> Using the adjacency list model will only require adding a row with parent relationship. <<

The adjacency list model will built a tree one node at a time much faster. But DELETE can result in problems, since the tree becomes a disjointed forest. And aggregation queries are a nightmare of cursors or self-joins.

>> I am wondering whether I can sub-divide my tree by some branch id.
So in essence have several trees acting as one big tree. That way adding a node will only result in updates to the relevant sub-tree. <<

I am not sure what you mean, but you can store more than one tree in a table:

 CREATE TABLE Forest
 (tree_id INTEGER NOT NULL,
  node CHAR(15) NOT NULL,
  lft INTEGER NOT NULL,
  rgt INTEGER NOT NULL,
  CHECK (lft < rgt),
  PRIMARY KEY (tree_id, lft, rgt));

>> Any advice on high-performance binary trees would be greatly
appreciated.<<

Might want to look at using a heap representation in a table. This is an old array trick. Given an array dimensioned A[1:n] the left child of A[i] is at A[2*i] and the right child of A[i] is at A[2*i + 1]. For example,

              Albert 
            /        \
           /          \
        Bert        Chuck 
                    /   | 
                   /    | 
                Donna  Eddie 
                      / | 
                     /  | 
                   nil Fred

 becomes:
CREATE TABLE BinaryTree
(node CHAR(10) NOT NULL,
 heap_nbr INTEGER NOT NULL);

  Albert 1
  Bert 2
  Chuck 3
  Donna 4
  Eddie 5
  Fred 11

I showed the missing (nil) left child for Fred. The MAX(heap_nbr) is the upper limit on thenumber of nodes in a binary tree of that depth. Received on Thu Feb 07 2002 - 18:48:23 CET

Original text of this message