Re: query that Bob is descendant of Alice

From: fervvac <>
Date: Mon, 23 Apr 2001 10:30:59 +0800
Message-ID: <9c03s3$61k$>

Dear Robert,

An approach base on mapping comes to my mind as follows:

+) For each node of the tree, attach two attributes, say start and end. Start and end denotes the "region" this node covers.

+) Then traverse the tree in the depth first order. Maintain a counter that increases every time a new node is visited. Mark the start attribute of the visiting node as the value of the counter if the node is being visited first time. Otherwise (it means we are backtracking to this node), set the value of the counter to the end attribute.

+) In this way, A is a parent of B iff A's (start,end) covers B's (start,end). If we have B+-tree built for start and end attribute, I guess such query can be efficiently evaluated.

+) If you have a DAG other than a tree, it seems still possible to apply the similar scheme, but there are some thing to consider what value is there in the counter.

+) If you want to support insertion/update, the value in the start or end attribute might have to be a real value instead of an integer.

Another approach, if you need high performance of query processing, is to materialize all the parents of a leaf node. That is a typical solution which trade space for time. However, it seems such strategies are widely used nowadays, given that storage is no long a problem.


"Robert Vazan" <> wrote in message
> Subject says it all, but there is another example:
> I have 30MB of code. I construct function call tree (line
> from caller to callie) (which is not tree but oriented
> acyclic network). Now I want to know whether function
> bad_hacker_trojan_horse() is directly or indirectly called
> by function draw_nice_icon(). How do I do it? I was thinking
> databases are supposed to do this, but after playing with
> PostgreSQL, I think SQL can express only static depth of
> links. If databases cannot, I would like to see algorithm or
> something else, that helps.
> To this point I found some unsatisfactory solutions:
> 1. Partial solutions. For example tree (which occupies
> another thread) A(B,C(D,E),F) can be represented this way
> (use fixed-font for a while):
> We can store <line,first_column,last_column> for every node
> and all queries can be done through numerical comparisons.
> Another partial solution is total ordering, which can be
> mapped to integers.
> 2. Transitive closure (ie, if there is a->b->c chain, then
> we add also a->c). This is extremely expensive for >1MB
> database.
> 3. Traversing network on every query. This consumes no
> memory, but eats a lot of time (and I want to do thousands
> of queries). If done in groups, time is better, but I think
> there are types of group queries, that cannot be linear this
> way (I havent an example, but I think there is one). It
> complicates serial access to data or makes it impossible
> (and 30MB database wont fit into memory). Also this requires
> construction of special algorithm for every query (as far as
> I know).
> Quiz
> 1. Can this be done? (practically)
> 2. With ~ O(n) memory?
> 3. With ~ O(n) time?
> 4. With existing software?
> 5. With databases?
> 6. Is there something I am missing?
> )-: General motivation behind all this is to analyse various
> free-form databases around me. I could have real arguments
> instead of wishful thinking and also I could avoid monotone
> rewriting work. But I quickly realized, that these databases
> contain more lists than algebraic structures and they are
> full of partial orderings and half-satisfied rules and other
> dangers. Well, this is not going to be easy game. :-(
> /Robert
Received on Mon Apr 23 2001 - 04:30:59 CEST

Original text of this message