Re: parent/child relationship in the same table.

From: Kendall <kendallwillets_at_yaaaaahhhhoooooo.com>
Date: Mon, 19 Nov 2001 21:58:27 -0800
Message-ID: <pan.2001.11.19.21.58.26.837.15513_at_yaaaaahhhhoooooo.com>


In article <6768b874.0111190852.72cf03e8_at_posting.google.com>, "Jason Diamond" <jason_at_injektilo.org> wrote:

> I had exactly this same thought after asking my original question. It
> seems to suffer from two problems, though. The first is that I'd still
> like to maintain the original "numeric" ordering and to do that I would
> have to prefix each step in the path with zeroes which would make the
> paths pretty verbose which leads to my next problem: given enough
> descendants, I can eventually exceed the length limit on the path
> column.
>
>
>
>
 

I see you're using sql server, so let's cut to the chase: use a varbinary for the path and cast your integer id's to BINARY(4), concatenating them in the same way. I forget what the length limit is on varbinary, but I think it's 32k. It sounds klunky, but performance is quite good, particularly if the count of other columns is low - do it all in a temp table if necessary.

You're correct that this approach is vulnerable to a pathological case or two (very high numbers of levels), but every approach has some sort of flaw - there's no way to maintain the ordering without either variable numbers of bits, or renumbering everything during inserts and deletes.

Keep an eye on stats like max depth, average depth, etc.

>> Paths are fairly easy to build. One way is to start with the root node
>> and iteratively set each encountered node's path to its parent's path
>> concatenated with the node id, eg
>>
>> update table set path = '0.' where node_id = 0
>>
>> update c set path = p.path + c.node_id from table p, table c where
>> c.parent_id = p.node_id
>> and p.path is not null.
>> and c.path is null
>>
>> Iterate the second expression until no more rows are updated (most db's
>> let you check the "rows updated" after each statement). The number of
>> iterations required is the number of levels in the hierarchy (i.e., not
>> too many).
>
> This is really interesting. I was thinking that I would just build the
> path when I do the first insert. Of course, I'd have to know the ID
> before that so couldn't use an IDENTITY column.

For the identity column, you can make the path end at the parent id and make your select do:

order by path, node_id

That means the insert trigger sets

path = p.path + p.node_id

and the new node's id is not needed.

You could also conceivably use:

order by path, parent_node_id, node_id

and set

path = p.path + p.parent_node_id

on insert.

In either case, insert the root carefully :-).

> I'm using SQL Server but want to make sure that it's portable.
>
>
>
Binary columns don't seem to be too portable :-< I can't even represent them as hex - no hex conversion function that I'm aware of.

Kendall Received on Tue Nov 20 2001 - 06:58:27 CET

Original text of this message