Best SQL hierarchical model for large/heavily updated graphs?

From: marchaos <marchaos_at_gmail.com>
Date: 6 Feb 2006 20:30:39 -0800
Message-ID: <1139286639.188763.80840_at_z14g2000cwz.googlegroups.com>



I'm looking for a concept for modeling a web hierarchy, which has the potential to grow into the 10s of millions of nodes. The hierarchy is technically a graph as nodes can have multiple parents. An example of a node with multiple parents would be a web page linked under several parent web pages. The technique cannot use any propriety syntaxes (ie. CONNECT BY or CTEs) as the implementation requires support for a number or databases.

The most common tasks performed on the hierarchy are:

  • get parents of a node
  • get children of a node
  • joining the hierarchy for filtering purposes
  • insert/deleting nodes
  • moving/deleting branches
  • linking children/branches into several places in the hierarchy

Currently we are using the materialized path technique, mapping the path of nodes using a series of ascii characters. One of the fundamental floors of this technique is that if you want to link a particular branch into another part of the tree, you have to replicate the whole branch's materialized paths to the new position in the tree, rather than creating one link for the parent of that branch. If you require a branch to be linked in several places the table and indexes grow quickly.

I have reviewed several techniques including Celko's nested sets, Tropashko's nested intervals, adjacency lists etc, and I believe that these techniques are fit for specific scenarios, this being not one of those. Is there a model that someone can suggest that best fits this scenario? Or is this wishful thinking? Received on Tue Feb 07 2006 - 05:30:39 CET

Original text of this message