Re: Modelling large trees and hierarchies

From: Lennart Jonsson <lennart_at_kommunicera.umea.se>
Date: 17 Jun 2004 00:15:32 -0700
Message-ID: <6dae7e65.0406162315.364a550_at_posting.google.com>


paula_at_pivetal.com (Paul Arnold) wrote in message news:<b6ee5aa3.0406161236.6b2e1205_at_posting.google.com>...

[...]

>
> This infomation would allow us to combine many of the models Joe Celko
> discusses, including:
>
> 1) The Nested Sets/Interval model - allows us to quickly find
> subtrees.
> 2) The Adjacency List model - allows us to quickly find immediate
> child nodes.
> 3) The Materialized Path model - allows us to quickly find the
> ancestors of the node.
> 4) The Depth Model - allows us to quickly find nodes related to their
> levels/depth.
>

Not sure what you mean by Materialized Path, but here is a model that I have used:

create table tree (
  nodeid int not null primary key,
  parentid int not null references tree

     on delete restrict, -- parentid == nodeid means root   ...
)

create table ancestors (
  nodeid int not null,
  ancestorid int not null,
  constraint PKAncestors primary key(nodeid, ancestorid) )

the ancestor relation is maintained via triggers. Example for adding a node in the tree.

create trigger add_node
after insert on tree
referencing new as newrow
for each row mode db2sql
-- prevent insertion of root in closure
when (newrow.nodeid <> newrow.parentid)
begin atomic
  insert into ancestors (nodeid, ancestorid)     values (newrow.actorid, newrow.parentid)     union all
    select newrow.nodeid, ancestorid from ancestors where nodeid = newrow.parentid

You will need triggers for deleting and moving nodes as well. I have a few notes on the subject at:

http://fungus.teststation.com/~jon/treehandling/TreeHandling.htm

>
> It is also quite likely that we will want to change the structure of
> the tree fairly frequently. As an indication, I can see update rates
> of every 15 minutes or less.
>
> To us, disk space and hardware are cheap, so we are not concerned with
> storage overhead. We just need optimal query performance.
>
> Types of queries we need are for example:
>
> Given a node...
> 1) Get it's immediate children.

select * from tree where parentid = ?

> 2) Get it's ancestors.

select ancestorid from ancestors where nodeid = ?

> 3) Get all of it's descendants.

select nodeid from ancestors where ancestorid = ?

> 4) Get it's descendants to a certain depth.

select nodeid from ancestors a
where ancestorid = ?
  and ? >= (select count(1) from ancestors where nodeid = a.nodeid)

> 5) Get it's siblings

select nodeid from tree
where parentid = (select parentid from tree where nodeid = ?)   and nodeid <> ?

> etc. etc.
>
> Any suggestions/foresight/tips that may help us in the database
> modelling would be most appreciated?

HTH
/Lennart Received on Thu Jun 17 2004 - 09:15:32 CEST

Original text of this message