Re: Making a tree with "millions and millions" of dynamic nodes

From: --CELKO-- <>
Date: 10 Dec 2003 13:19:00 -0800
Message-ID: <>

>> The depth should not exceed 6 or 7 levels. The initial import will
have about 6 million leaves, and 3 million branches. I would expect the leaves to grow significantly, in number easily tripling. <<

That is not too bad -- depth would worry me more than breadth

>> 1. path generation speed <<

That is pretty easy in a nested set model:

 SELECT T1.node, (T1.rgt-T1.lft) AS depth    FROM Tree AS T1, Tree AS T2
  WHERE T2.lft BETWEEN T1.lft AND T1.rgt     AND T2.node = :my_guy
 ORDER BY depth;

The greater (T1.rgt-T1.lft) is, the closer the node is to the root

You can also convert depth into a sequential number by using.

SELECT T2.node, COUNT (T1.node)AS depth

   FROM Tree AS T1, Tree AS T2
  WHERE T2.lft BETWEEN T1.lft AND T1.rgt   GROUP BY T2.part;

>> 2. immediate sibling speed <<

SELECT B.node AS boss, P.node
  FROM Tree AS P

       Tree AS B
       ON B.lft 
          = (SELECT MAX(lft)
               FROM Tree AS S
              WHERE P.lft > S.lft
                ND P.lft <S.rgt);

>> 3. children count speed <<

 SELECT :my_guy, (rgt - lft +1)/2 AS subtree_size    FROM Tree
  WHERE node = :my_guy;

>> I have implemented a first run using Joe Celko's Nested Sets (w/ a
mod to store tree level for speed). The update propigation issue is the achilles heel of this approach for me. <<

It is not as bad as people first think. The tree structure is in one table and the nodes are in another, so you can manipulating of two integers and one key (perhaps another integer). Since new siblings are added to the right side of the existing line of siblings under the parent, you average a bit less than half of the nodes being scanned in practice.

There are some other tricks, like leaving a large gap between (lft) and (rgt) numbers so you have room to insert new nodes.

>> So it seems Materialized Path is my only option, however I am
concerned about LIKE performance for the right hand side of the tree, where the path is 8 digits x 6 levels = 48 chars. Should I be concerned? <<

It does not sound too bad, assuming an ordered index. I wonder if you could improve performance of the LIKE predicates by reversing the path string ...

>> ... some reassuring words about real-time performance of a 20
million row tree, i'd love to hear <<

Sorry, the most I have played with is a few hundred thousand rows with the nested set model. Received on Wed Dec 10 2003 - 22:19:00 CET

Original text of this message