Re: how to suppress carefully a recursive tree

From: David BL <davidbl_at_iinet.net.au>
Date: Wed, 23 Jan 2008 16:43:32 -0800 (PST)
Message-ID: <ab2e0ad4-7423-43af-b6dc-d1ad9f3ae440_at_e4g2000hsg.googlegroups.com>


On Jan 24, 1:24 am, fj <francois.j..._at_irsn.fr> wrote:
> On 23 jan, 16:35, fj <francois.j..._at_irsn.fr> wrote:
>
>
>
>
>
> > On 23 jan, 14:32, David BL <davi..._at_iinet.net.au> wrote:
>
> > > On Jan 23, 4:10 pm, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > On 23 jan, 01:12, David BL <davi..._at_iinet.net.au> wrote:
>
> > > > > On Jan 23, 1:10 am, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > > > On 22 jan, 15:52, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > > > > > On 22 jan, 12:04, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > > situation :
>
> > > > > > > I'm guessing that when you say "suppress" you mean "represent in a
> > > > > > > database". Correct?
>
> > > > > > No : I want to destroy, remove, kill ... a part of the data (a
> > > > > > complete tree or just a branch), but without destroying data shared by
> > > > > > other trees or branches.
>
> > > > > Would the mark and sweep algorithm suit you purposes?
>
> > > > No
>
> > > > the mark and sweep algorithm needs to know all the tree roots in order
> > > > to mark all the used objects.
>
> > > > In my example I indicates that the node b3 belongs to the root r2
> > > > just for information to explain the storage count. But the deletion
> > > > routine I want to write just receives "r1" as argument, nothing else.
> > > > It does not know that r2 exists too ... It is just able to detect the
> > > > existence of other objects via the storage count of b3 which is a
> > > > little bit too high for being just referenced by r1 !
>
> > > For a directed edge from node n1 towards node n2, let n2 be referred
> > > to as the "destination" node.
>
> > > n1 ---->---- n2
> > > source destination
>
> > > Maybe the following approach would work:
>
> > > It is assumed that node r1 is the root of the "tree" to be deleted.
>
> > > Step 1: Perform a trace (in the manner of a mark phase) treating r1
> > > as the one and only root and decrementing the storage counter on the
> > > destination node for each edge that is traversed. ie find the set X
> > > of nodes that are reachable from r1. Nodes that have already been
> > > visited are flagged to ensure each reachable node is visited exactly
> > > once. For each node during the recurse, retrieve the set of outgoing
> > > edges yielding a set of destination nodes. Decrement each destination
> > > node's storage counter (this must be done for all the destination
> > > nodes - ie including the nodes that have already been visited). Then
> > > recurse into the destination nodes that haven't been visited before.
>
> > > Step 2: Find the subset R of X corresponding to nodes whose storage
> > > counter hasn't fallen to zero.
>
> > > Step 3: Run a mark and sweep using roots R and sweeping over X. For
> > > each edge traversed in the mark phase, increment the storage counter
> > > on the destination node.
>
> > > In your example I think step 1 will find X = {r1,b1,b2,b3,b4} and only
> > > b3 will end up with a nonzero storage counter, so step 2 yields R =
> > > {b3}. Then in step 3, a trace from roots {b3} will bump the storage
> > > counter of b4 back up to 1 and only {r1,b1,b2} will be deleted.
>
> > > Am I making sense?
>
> > Yes.
>
> > I was arrived to a very similar solution as I explained in my second
> > message. Nevertheless, several steps are expensive, especially if I
> > suppose forbidden to add new pieces of information in the nodes
> > themselves : only the creation a new local tables of linked lists is
> > authorized. For instance :
>
> > - in the step 1, flagging a visited node to avoid recursion might be
> > expensive. At the moment, I create a linked list of nodes in
> > traversing the sub-tree. So when I visit a node then I start by
> > looking for it in the linked list. This simple research is expensive
> > (cost in O(n2) if n is the number of nodes in the subtree). If the
> > node is not found, then I add a new term in my linked list which
> > contains the node index and its storage count minus 1 (at the end,
> > this count will be the external count). If I find it then I just
> > decrement the external count in the found term.
>
> > - the step 2 is OK (cost in O(n))
>
> > - the step 3 may be expensive too (all depends on the number of nodes
> > with an external count greater that 0).
>
> > In my solution, the step 3 was replaced by traversing again the tree
> > from r1 and going down a node only if its external count is equal to
> > zero. During this step, the true storage count of nodes is decremented
> > and nodes with a count equal to zero are destroyed. But this solution
> > has again a drawback : looking for the current visited node in the
> > linked list in order to get its external count. A possible improvement
> > was to create a new field in each node used to store the external
> > count there rather than in a parallel linked list.
>
> > Another solution might be to create an array which size is the total
> > number of nodes. This array will replace the linked list. The
> > advantage is clear : looking for the external count in this array is
> > immediate, the node index corresponding to the array index. The
> > drawback is the size of the array, the total number of nodes being
> > potentially huge ( > 1 million) whereas the number of nodes in a
> > deleted branch is generally several orders of magnitude lower (but
> > often greater than 1000).
>
> Oups ! I think that the main drawback is the fact that this solution
> does not work correctly !
> So the step 3 of David is much better !
>
> After looking for GC on internet, I found the Python GC uses a
> technique similar to the David's solution.

Interesting. I had thought Python only used basic reference counting.

> > An intermediate solution consists in using an extensible hash
> > table ...
>
> > In any case, I would like a solution in O(n) else such algorithm is
> > not tractable.

The solution I suggested is O(n) where n = |X| if all of the following are O(1)

  • Retrieving the set of outgoing edges for a given node
  • Retrieving the destination node for a given edge
  • Reading/writing the visited flag for a given node
  • Reading/writing the storage count for a given node

I would avoid linked lists. Assuming hash lookup is O(1), I think my suggested solution can be O(n) despite constraints on what information may be stored in a Node. Received on Thu Jan 24 2008 - 01:43:32 CET

Original text of this message