Making a tree with "millions and millions" of dynamic nodes

From: Christian Fowler <google_at_NOSPAM.gravesweeper.com>
Date: Tue, 02 Dec 2003 17:40:53 -0600
Message-ID: <6b-dnQJnDI6YvlCiXTWc-g_at_speakeasy.net>


I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres, though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and 3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the branches will likely stay very constant in number, but I expect there locations to shift around somewhat (affecting up to thousand of children).

For selection, it is crucial for me to get:

  1. path generation speed
  2. immediate sibling speed
  3. children count speed

I have implemented a first run using Joe Celko's Nested Sets (w/ a mod to store tree level for speed). The update propigation issue is the achilles heel of this approach for me. I have read Vadim Tropashko Nested Interval concept ( http://www.dbazine.com/tropashko4.html ) , and my brain is painfully stretched enough to get the general idea. I have a feeling i will hit the arithmetic issues others of reported.

So it seems Materialized Path is my only option, however I am concerned about LIKE performance for the right hand side of the tree, where the path is 8digits x 6 levels = 48 chars. Should I be concerned? I need split-second real-time performance, and can't imagine it will be as fast the Nested Set arithmatic approach. I can flatten out the import to insure the upper tree has the smallest numbers, however, it will save at most 8 chars on the path.

I've been googling through USENET archives watching the big debates of years gone by. I've grazed much knowledge, but now am curious if anyone has any thoughts/advice/war stories about a data set this large.

(and if any fellow postgres fans have some reassuring words about real-time performance of a 20 million row tree, i'd love to hear ;-)

-- 
[ \ /

[ >X< Christian Fowler | http://www.steelsun.com/
[ / \
Received on Wed Dec 03 2003 - 00:40:53 CET

Original text of this message