Re: how to suppress carefully a recursive tree

From: fj <francois.jacq_at_irsn.fr>
Date: Wed, 23 Jan 2008 08:24:56 -0800 (PST)
Message-ID: <a8856f36-4e96-4534-956a-5d910106324f_at_v29g2000hsf.googlegroups.com>


On 23 jan, 16:35, fj <francois.j..._at_irsn.fr> wrote:
> On 23 jan, 14:32, David BL <davi..._at_iinet.net.au> wrote:
>
>
>
> > On Jan 23, 4:10 pm, fj <francois.j..._at_irsn.fr> wrote:
>
> > > On 23 jan, 01:12, David BL <davi..._at_iinet.net.au> wrote:
>
> > > > On Jan 23, 1:10 am, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > > On 22 jan, 15:52, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > > > > On 22 jan, 12:04, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > situation :
>
> > > > > > I'm guessing that when you say "suppress" you mean "represent in a
> > > > > > database". Correct?
>
> > > > > No : I want to destroy, remove, kill ... a part of the data (a
> > > > > complete tree or just a branch), but without destroying data shared by
> > > > > other trees or branches.
>
> > > > Would the mark and sweep algorithm suit you purposes?
>
> > > No
>
> > > the mark and sweep algorithm needs to know all the tree roots in order
> > > to mark all the used objects.
>
> > > In my example I indicates that the node b3 belongs to the root r2
> > > just for information to explain the storage count. But the deletion
> > > routine I want to write just receives "r1" as argument, nothing else.
> > > It does not know that r2 exists too ... It is just able to detect the
> > > existence of other objects via the storage count of b3 which is a
> > > little bit too high for being just referenced by r1 !
>
> > For a directed edge from node n1 towards node n2, let n2 be referred
> > to as the "destination" node.
>
> > n1 ---->---- n2
> > source destination
>
> > Maybe the following approach would work:
>
> > It is assumed that node r1 is the root of the "tree" to be deleted.
>
> > Step 1: Perform a trace (in the manner of a mark phase) treating r1
> > as the one and only root and decrementing the storage counter on the
> > destination node for each edge that is traversed. ie find the set X
> > of nodes that are reachable from r1. Nodes that have already been
> > visited are flagged to ensure each reachable node is visited exactly
> > once. For each node during the recurse, retrieve the set of outgoing
> > edges yielding a set of destination nodes. Decrement each destination
> > node's storage counter (this must be done for all the destination
> > nodes - ie including the nodes that have already been visited). Then
> > recurse into the destination nodes that haven't been visited before.
>
> > Step 2: Find the subset R of X corresponding to nodes whose storage
> > counter hasn't fallen to zero.
>
> > Step 3: Run a mark and sweep using roots R and sweeping over X. For
> > each edge traversed in the mark phase, increment the storage counter
> > on the destination node.
>
> > In your example I think step 1 will find X = {r1,b1,b2,b3,b4} and only
> > b3 will end up with a nonzero storage counter, so step 2 yields R =
> > {b3}. Then in step 3, a trace from roots {b3} will bump the storage
> > counter of b4 back up to 1 and only {r1,b1,b2} will be deleted.
>
> > Am I making sense?
>
> Yes.
>
> I was arrived to a very similar solution as I explained in my second
> message. Nevertheless, several steps are expensive, especially if I
> suppose forbidden to add new pieces of information in the nodes
> themselves : only the creation a new local tables of linked lists is
> authorized. For instance :
>
> - in the step 1, flagging a visited node to avoid recursion might be
> expensive. At the moment, I create a linked list of nodes in
> traversing the sub-tree. So when I visit a node then I start by
> looking for it in the linked list. This simple research is expensive
> (cost in O(n2) if n is the number of nodes in the subtree). If the
> node is not found, then I add a new term in my linked list which
> contains the node index and its storage count minus 1 (at the end,
> this count will be the external count). If I find it then I just
> decrement the external count in the found term.
>
> - the step 2 is OK (cost in O(n))
>
> - the step 3 may be expensive too (all depends on the number of nodes
> with an external count greater that 0).
>
> In my solution, the step 3 was replaced by traversing again the tree
> from r1 and going down a node only if its external count is equal to
> zero. During this step, the true storage count of nodes is decremented
> and nodes with a count equal to zero are destroyed. But this solution
> has again a drawback : looking for the current visited node in the
> linked list in order to get its external count. A possible improvement
> was to create a new field in each node used to store the external
> count there rather than in a parallel linked list.
>
> Another solution might be to create an array which size is the total
> number of nodes. This array will replace the linked list. The
> advantage is clear : looking for the external count in this array is
> immediate, the node index corresponding to the array index. The
> drawback is the size of the array, the total number of nodes being
> potentially huge ( > 1 million) whereas the number of nodes in a
> deleted branch is generally several orders of magnitude lower (but
> often greater than 1000).

Oups ! I think that the main drawback is the fact that this solution does not work correctly !
So the step 3 of David is much better !

After looking for GC on internet, I found the Python GC uses a technique similar to the David's solution

>
> An intermediate solution consists in using an extensible hash
> table ...
>
> In any case, I would like a solution in O(n) else such algorithm is
> not tractable.
Received on Wed Jan 23 2008 - 17:24:56 CET

Original text of this message