Tree structures in an RDBMS

From: Denis Jedig <dj_at_syneticon.de>
Date: Wed, 28 Nov 2001 20:07:40 +0100
Message-ID: <3C0535FC.9030807_at_syneticon.de>



Hello Group,

The following is my current challenge:

I have a tree-ordered set of data (not necessarily balanced) and need to store it in a relational database. The requirements for the relational representation are:
- ability for branch queries forward as well as backward: i.e. given

   a node X, I need to get all nodes and leaves that are below X, given    a node Y, I need all nodes in the path from the root down to Y.
- extensibility: there must not be a hard limit on the number of nodes

   or the number of tree levels, the tree must accept addition of new    nodes at any position
- reorderability: from time to time branches will need to get reordered

   within the tree e.g. move a specific branch from level 4 below a node Y    to level 12 below node Z.

I had a solution in mind by myself and read several news articles, that google had spit out. The often quoted Joe Celko's left-right indexed tree example disqualifies for my purposes, since the tree itself is far from being static - it is steady growing and even reordering from time to time (as stated in the requirements)
So currently I am thinking about storing the tree structure in a table as a 2-tuple of references to the node-table just like in a linked list:

CREATE TABLE Nodes (NodeNr integer not null, NodeName varchar); CREATE TABLE NodeTree (ParentNodeNr int, ChildNodeNr int);

That way my structure retains extendable - it has no inherent limits for tree depth or number of nodes - and reordering of entire branches just takes a change of a single value.
My problem is "unfolding" the tree from its relational representation in either direction, thus retreiving all nodes below or above a node X. The procedure must happen entirely in the backend.

A general approach would be a slight variation of an n-depth recursive SELECT ChildNodeNr FROM NodeTree WHERE ParentNodeNr IN (SELECT ChildNodeNr FROM NodeTree WHERE ParentNodeNr IN (...)); where I also would preserve the results of my subqueries and UNION them all in the end, but I fear this solution to be far too costy in evaluation on the one hand and a real limiting factor on the other- most DBMS would limit the number of tables opened for a single query to something around 15 - so I would end up with a practical query depth limit of 15 levels for my tree (even worse, that value would depend on the DBMS used and on some DBMS this limit might be exhausted earlier due to use of temporary tables in the query process).
May be that I understood something wrong or missed some points in my considerations about that approach - would be glad to hear something on that. BTW, I read in articles dating 1999 and prior, that SQL3 would support recursive querying as well as some extensions of commercial SQL server products (Oracle) do. Is there a recommended "further reading" on that and something like a list of RDBMS that allow such a query?

Another way to query that came in my mind would be to CREATE TABLE temp_table (NodeNr int);
INSERT INTO temp_table (NodeNr) VALUES (X);

                                         ^
                                this X being a starting point
                                from where to "unfold" the tree
and then sequentially to
INSERT INTO temp_table (NodeNr) SELECT NT.ChildNodeNr FROM NodeTree NT, temp_table TT WHERE TT.NodeNr=NT.ParentNodeNr AND NOT EXISTS (SELECT * FROM temp_table TT_ex WHERE TT_ex.NodeNr=NT.ChildNodeNr) until the number of rows affected is 0.
I expect this solution to perform even worse than the prior one, but at least it does not limit my query depth.

I can't think of any better solution yet, so I am posting here today...

Some values estimated for the tree structure, its size and the main query character and the environment:

The tree is supposed to have about 15 levels with a mean of 3 child nodes arising from each parent. This lets me end up at the calculation of Sum(k=1;15;3^k)=21*10^6 nodes to store in my representation. Most queries will be directed "forward", i.e. from a specific node X down to the leaves, and will typically start 3 levels above the mean leaf level for that branch.
The database is expected to be written for PostgreSQL.

Any comments on that kind of implementation as well as any hints about how to address my problem are greatly appreciated. I suppose, the "object guys" sometimes need to resolve similar problems whenever they need to store their object tree in a relational DB, but I have not found a relevant implementation yet. Maybe the left-right indexed model is working fine for them, who knows.

BTW: I suppose I am somewhat off-topic here, but there is no comp.databases.relational for some reason.

TIA Denis Received on Wed Nov 28 2001 - 20:07:40 CET

Original text of this message