Storing hierarchical information?

From: Dave Kimmel <criscokid_at_v-wave.com.nospam>
Date: Mon, 07 May 2001 17:45:59 GMT
Message-ID: <slrn9fdnor.iqd.criscokid_at_deepcore.gsm>


I currently have a database which stores a hierarchical tree of documents which make up a policy manual for work. The hierarchy is handled really simply - each document has a parent attribute, if the parent attribute is null then that document is a "root" document. This works rather well in some ways, but sucks incredibly in other ways. Documents are identified by a docid, startdate, and enddate, which allows me to keep track of changes very easily. Instead of making a change by updating, I make a change by updating the current revision and setting enddate = now(), then adding a new document with startdate = now(). The interface is entirely web based, the database is MySQL.

For example, one of the ways it sucks is that each document must have a unique document ID. Ideally, the IDs should be descriptive, but we have a lot of repeating between sections. So the DE and DA sections of the manual might both have a policy on Documentation, and they are nearly identical, but they need unique IDs. This isn't too bad, but it might get hairy since we plan to add a ton of documents in the near future.

Also, we plan on having multiple roots in the near future and we'd like to have them seperate. So when you're in rootA, you can't see documents belonging to rootB. Right now, this is easy to do for everything except things that apply across multiple documents, such as searching or getting a list of recent changes. Basically, I will have to find the root document by recursively reading the parent attribute until I find a null, then I will have to generate a list of all documents that eventually lead back to this parent document, then I will have to restrict the search to apply to just those documents. This is nasty and very slow, but yes, it does work.

So, what I want to ask is this: Is there any "proper" way to store hierarchical data in a database, much like operating systems store files? Is what I'm doing fine, or is there a way that could be faster?

I'd rather not just throw more hardware at this.

Here are some of the solutions I've thought of so far:

  1. Adding a "root" attribute to the documents table and making it part of the PK. This will allow me to share docids between roots, but will require many, many changes to the scripts that render the web pages. I can't think of a single piece of code in this that won't need to be touched. This would let me have multiple document hierarchies, and could be considered The Right Thing To Do.
  2. Adding a "root" attribute to the documents table and NOT making it part of the PK. This doesn't let me share docids, instead it would act more like a cache so that I don't have to recurse up to find the root document and generate lists of children to restrict certain operations. This also makes it harder to update, but not much harder. I'd only have to change one function to get it to work and then modify various other ones to take advantage of it. The big advantage is that using it would be optional. However, it doesn't strike me as The Right Way To Do Things as far as relational databases go.
  3. Adding a cache table, which will cache various bits of information that are normally generated with recursive functions. This could work, and could work very well, but again it doesn't strike me as The Right Way To Do Things.
  4. Something completely different. Switching to a hierarchical database, like whatever Zope uses, comes to mind. The problem with this is that I know absolutely nothing about hierarchical databases. This is probably closer to The Right Way To Do Things than anything else.
  5. Scrap the database, store the documents in the filesystem. I did this before, it was very, very slow, not to mention relatively unusable.

Thanks in advance for your input!
-- Dave Kimmel

   criscokid_at_v-wave.com
   ICQ: 5615049 Received on Mon May 07 2001 - 19:45:59 CEST

Original text of this message