Re: Hierachical structures - an overview

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Sun, 4 Jan 2004 07:36:37 -0600
Message-ID: <bt94ta$r1r$1_at_news.netins.net>


"Mike MacSween" <mike.macsween.nospam_at_btinternet.com> wrote in message news:3ff7e63a$0$52881$5a6aecb4_at_news.aaisp.net.uk...
> I have an app I need a hierachical structure for. There seem to be 3 ways
of
> implementing this, as far as I can see:
>
> 1. Adjacency list.
> Pros - intuitive and relatively simple
> Cons - not easily accesible via standard SQL, needs recusive queries to
> crawl up or down the structure. I'll be doing this in Jet or perhaps MS
SQL
> Server, so Oracle's Connect by is out.
>
> 2. Nested Sets a la Joe Celko's BOM.
> Pros - easy to access the structure via standard SQL
> Cons - expensive/complex when the structure changes frequently.
>
> 3. Materialised Paths.
> Pros - easy to access the structure via standSQL, the hierachy is obvious
> and stored in a single field.
> Cons - None? The value in the 'path' field doesn't _appear_ to be atomic,
> that might well be a debatable point.
>
> 1a. Adjacency list + as per:
> http://fungus.teststation.com/~jon/treehandling/TreeHandling.htm
> Not sure about this. It claims to give the path upwards. It looks like it
> merely gives the ancestors, which isn't the same thing.
>
> Any other techniques?
>
> Any other pros and cons?
>
> Yours, Mike MacSween
>

OK, I'm sucked into the discussion with this topic heading, having opted out of the discussion when you were trolling only for SQL-based solutions. If you are open to non-relational database solutions, the databases that I work most closely with these days (U2 from IBM) are sometimes called nested-relational. They are not in 1NF.

While I have a write-up of how a person would do data modeling in this arena, I'll try to give the Reader's Digest version in response to your question.

Data modeling for a specific (not "in general") hierarchical structure: 1. Separate People, Places, and Thngs into separate categories and Events or Actions into a 4th category. People, Places & Things would be your noun categories to be pointed to by your Events/Actions (with the former termed Master files in old lingo and the latter Transaction files).

2. The hierarchy you are looking at is related to your Events, so I'll focus on how to model the Transactions. Look at the highest grouping down to the most granular. In your case, it is a Tour down to a Performance. The issue of work orders for one or another person would be in a separate transactions (Work Orders) category from the hierarchy of the Tours themselves. For additional input to your design (not becase it dictates the design), think about how a person would store and retrieve the Tour hierarchy information in a paper-based system. It is likely that there would be a file for each Tour and within that a file folder for each country, with the remaining files by date of event (there are other possibilities, so I'm just trying to think like a paper-worker to come up with a good approach).

3. Look at what data needs to be collected about all hierarchy points between the top and the most granular. Is most relevant data held at the level of the Tour or the specific event? Could you flatten this hierarchy by thinking of the middle nodes as being attributes of the granular data? What would be lost?

In other words, think of file folders for the Tour and those for the actual Performance or Rehearsal and then on the most granular information, you log one or more attributes (that can contain multiple values) for designations of Country, City, etc. It is likely this would be sufficient, but it does not capture the full tree structure.

So, alternatively, put all Tours and sub-Tours in the "Named Tours" category and have a ContainsTour attribute within the Tour category. Then a particular Tour contains these City tours.

Conclusion: I could write way too much on this, so I'll cut it off here and suggest that I might go about this by having three primary Transaction/Events "files" (like tables, but attributes can have multiple values as they could in an XML document). These would correspond to: Tours (for every non-granular level in the tree), Events, and WorkOrders. The Tour hierarchy becomes like the Person & Places master files -- it is information to which an event "points".

I know this might sound as if you are not capturing like data in like ways (various nodes in your tree are in different files) and that is true. Instead it is capturing data the way that people think. The result of using this model is that it will be faster to write the software system, easier to ask it the questions that people will have, and easier to maintain it over time.

But if an esoteric 1NF structure trumps these advantages or if you are stuck with a legacy RDBMS to use, then I suspect you will choose one of the other approaches to encoding a hierarchy.

Best wishes! --dawn Received on Sun Jan 04 2004 - 14:36:37 CET

Original text of this message