Re: how to suppress carefully a recursive tree

From: fj <francois.jacq_at_irsn.fr>
Date: Wed, 23 Jan 2008 07:35:35 -0800 (PST)
Message-ID: <f63f27a6-444c-49b8-9778-c699a5295fb5_at_e10g2000prf.googlegroups.com>


On 23 jan, 14:32, David BL <davi..._at_iinet.net.au> wrote:
> On Jan 23, 4:10 pm, fj <francois.j..._at_irsn.fr> wrote:
>
>
>
> > On 23 jan, 01:12, David BL <davi..._at_iinet.net.au> wrote:
>
> > > On Jan 23, 1:10 am, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > On 22 jan, 15:52, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > > > On 22 jan, 12:04, fj <francois.j..._at_irsn.fr> wrote:
>
> > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > situation :
>
> > > > > I'm guessing that when you say "suppress" you mean "represent in a
> > > > > database". Correct?
>
> > > > No : I want to destroy, remove, kill ... a part of the data (a
> > > > complete tree or just a branch), but without destroying data shared by
> > > > other trees or branches.
>
> > > Would the mark and sweep algorithm suit you purposes?
>
> > No
>
> > the mark and sweep algorithm needs to know all the tree roots in order
> > to mark all the used objects.
>
> > In my example I indicates that the node b3 belongs to the root r2
> > just for information to explain the storage count. But the deletion
> > routine I want to write just receives "r1" as argument, nothing else.
> > It does not know that r2 exists too ... It is just able to detect the
> > existence of other objects via the storage count of b3 which is a
> > little bit too high for being just referenced by r1 !
>
> For a directed edge from node n1 towards node n2, let n2 be referred
> to as the "destination" node.
>
> n1 ---->---- n2
> source destination
>
> Maybe the following approach would work:
>
> It is assumed that node r1 is the root of the "tree" to be deleted.
>
> Step 1: Perform a trace (in the manner of a mark phase) treating r1
> as the one and only root and decrementing the storage counter on the
> destination node for each edge that is traversed. ie find the set X
> of nodes that are reachable from r1. Nodes that have already been
> visited are flagged to ensure each reachable node is visited exactly
> once. For each node during the recurse, retrieve the set of outgoing
> edges yielding a set of destination nodes. Decrement each destination
> node's storage counter (this must be done for all the destination
> nodes - ie including the nodes that have already been visited). Then
> recurse into the destination nodes that haven't been visited before.
>
> Step 2: Find the subset R of X corresponding to nodes whose storage
> counter hasn't fallen to zero.
>
> Step 3: Run a mark and sweep using roots R and sweeping over X. For
> each edge traversed in the mark phase, increment the storage counter
> on the destination node.
>
> In your example I think step 1 will find X = {r1,b1,b2,b3,b4} and only
> b3 will end up with a nonzero storage counter, so step 2 yields R =
> {b3}. Then in step 3, a trace from roots {b3} will bump the storage
> counter of b4 back up to 1 and only {r1,b1,b2} will be deleted.
>
> Am I making sense?

Yes.

 I was arrived to a very similar solution as I explained in my second message. Nevertheless, several steps are expensive, especially if I suppose forbidden to add new pieces of information in the nodes themselves : only the creation a new local tables of linked lists is authorized. For instance :

  • in the step 1, flagging a visited node to avoid recursion might be expensive. At the moment, I create a linked list of nodes in traversing the sub-tree. So when I visit a node then I start by looking for it in the linked list. This simple research is expensive (cost in O(n2) if n is the number of nodes in the subtree). If the node is not found, then I add a new term in my linked list which contains the node index and its storage count minus 1 (at the end, this count will be the external count). If I find it then I just decrement the external count in the found term.
  • the step 2 is OK (cost in O(n))
  • the step 3 may be expensive too (all depends on the number of nodes with an external count greater that 0).

In my solution, the step 3 was replaced by traversing again the tree from r1 and going down a node only if its external count is equal to zero. During this step, the true storage count of nodes is decremented and nodes with a count equal to zero are destroyed. But this solution has again a drawback : looking for the current visited node in the linked list in order to get its external count. A possible improvement was to create a new field in each node used to store the external count there rather than in a parallel linked list.

Another solution might be to create an array which size is the total number of nodes. This array will replace the linked list. The advantage is clear : looking for the external count in this array is immediate, the node index corresponding to the array index. The drawback is the size of the array, the total number of nodes being potentially huge ( > 1 million) whereas the number of nodes in a deleted branch is generally several orders of magnitude lower (but often greater than 1000).

An intermediate solution consists in using an extensible hash table ...

In any case, I would like a solution in O(n) else such algorithm is not tractable. Received on Wed Jan 23 2008 - 16:35:35 CET

Original text of this message