Re: query that Bob is descendant of Alice

From: Mats Kindahl <matkin_at_iar.se>
Date: 23 Apr 2001 11:36:26 +0200
Message-ID: <um3db0c605.fsf_at_pjakkur.iar.se>


robova_at_nextra.sk (Robert Vazan) writes:

> Subject says it all, but there is another example:
>
> I have 30MB of code. I construct function call tree (line
> from caller to callie) (which is not tree but oriented
> acyclic network). Now I want to know whether function
> bad_hacker_trojan_horse() is directly or indirectly called
> by function draw_nice_icon(). How do I do it? I was thinking
> databases are supposed to do this, but after playing with
> PostgreSQL, I think SQL can express only static depth of
> links. If databases cannot, I would like to see algorithm or
> something else, that helps.

You can probably do it using SQL, but I believe that you are better off using some off-line processing and then store some information in the database.

[snip]

> Quiz
> 1. Can this be done? (practically)

I believe so.

> 2. With ~ O(n) memory?
> 3. With ~ O(n) time?
I believe that it can be done in O(1) time and O(n) memory after doing O(n) time preprocessing.

> 4. With existing software?

Eh, no I do not believe there is such a thing. However, the algorithm below is easy to implement.

> 5. With databases?
> 6. Is there something I am missing?
>
> )-: General motivation behind all this is to analyse various
> free-form databases around me. I could have real arguments
> instead of wishful thinking and also I could avoid monotone
> rewriting work. But I quickly realized, that these databases
> contain more lists than algebraic structures and they are
> full of partial orderings and half-satisfied rules and other
> dangers. Well, this is not going to be easy game. :-(

The call tree you have form a DAG, but I believe the following can be easily applied to DAGs and to "forests" as well [1].

There is an algorithm by Harel and Tarjan for finding the "least common ancestor" (LCA) in a tree [2,3]. The algorithm can find the LCA of two nodes A and B in constant time (i.e., O(1) time) after doing linear-time preprocessing of the tree. For a tree of size N, the algorithm requires you to store O(N*log N) bits of data (e.g., in you case it will be enough to store two 32-bit integers for each node in the tree, if your nodes are 480 bytes or less, two 16-bit integers will be enough). The algorithm performs some simple bitwise arithmetics on the two numbers to find the LCA.

Using the algorithm, you can find the LCA of, e.g., 'draw icon' and 'Trojan'. If LCA('draw icon', 'Trojan') = 'Trojan', then 'draw icon' calls 'Trojan' indirectly, otherwise it does not.

Pictorially, you have the two cases

        +                            +
        .                            .
        .                            .
        .                            .
       DRAW                        OTHER
        !                            !
     +--+--+                      +--+--+
     .     .                      .     .
     .     .                      .     .
     .     .                      .     .
     .   TROJAN                 DRAW   TROJAN


LCA(DRAW,TROJAN) = DRAW       LCA(DRAW,TROJAN) = OTHER

(Actually, you have a third case, namely

        +
        .
        . 
        .
      TROJAN
        !
     +--+--+         LCA(DRAW,TROJAN) = TROJAN
     .     .
     .     .
     .     .
     .    DRAW

which can be considered as a special case of the above such that OTHER = TROJAN.) Best wishes,
Mats Kindahl

[1] You can add an ultimate root node to a forest of trees and connect

    it to the root of each tree in the forest.

[2] D. Harel and R. E. Tarjan.

    Fast algorithms for finding nearest common ancestors.     SIAM J. Comput., 13:338-355, 1984.

[3] B. Schieber and U. Vishkin.

    On finding lowest common ancestors: simplification and     parallelization. SIAM J. Comput., 17(6):1253-1262,     December 1988.

-- 
Mats Kindahl, IAR Systems, Sweden

Any opinions expressed are my own, not my company's.
Received on Mon Apr 23 2001 - 11:36:26 CEST

Original text of this message