# Re: how to suppress carefully a recursive tree

Date: Wed, 23 Jan 2008 06:34:19 -0800 (PST)

Message-ID: <8111bab7-76cc-40dd-8bce-257dbc02fae7_at_c4g2000hsg.googlegroups.com>

On 23 jan, 13:40, Jan Hidders <hidd..._at_gmail.com> wrote:

> On 22 jan, 17:10, fj <francois.j..._at_irsn.fr> wrote:

*>
**> > On 22 jan, 15:52, Jan Hidders <hidd..._at_gmail.com> wrote:
**>
**> > > On 22 jan, 12:04, fj <francois.j..._at_irsn.fr> wrote:
**>
**> > > > I know how to suppress a normal tree but I meet the following kind of
**> > > > situation :
**>
**> > > I'm guessing that when you say "suppress" you mean "represent in a
**> > > database". Correct?
**>
**> > No : I want to destroy, remove, kill ...
**>
**> Try "delete", that's a better translation of "supprimer". To suppress
**> is more like "reprimer", "bannir" or "inhiber".
*

Thank you for the English lesson :-)

*>
*

> > a part of the data (a

*> > complete tree or just a branch), but without destroying data shared by
**> > other trees or branches.
**>
**> Ok, understood.
**>
**>
**>
**> > > > r1
**> > > > -> b1
**> > > > -> b2 -> ...
**> > > > -> b3 -> ...
**> > > > -> b2
**> > > > -> b1 -> ...
**> > > > -> b1 -> ...
**> > > > -> b3
**> > > > -> b4
**>
**> > > > r2
**> > > > -> b3 -> ...
**>
**> > > So, a directed graph with multiple roots. Correct? So basically you
**> > > want to store arbitrary directed graphs.
**>
**> > Yes and no. The two "roots" r1 r2 could perhaps belong to a bigger
**> > tree.
**>
**> > > > A same node can be referenced at several places. Each node is
**> > > > associated to a storage count :
**>
**> > > > r1(0) b1(3) b2(2) b3(3) b4(1) r2(0)
**>
**> > > Which would correspond to the number of incoming edges, yes?
**>
**> > Yes
**>
**> > > > In a normal tree without recursion (in the example above, recursion
**> > > > occurs because b1 contains b2 and vice versa), a node is destroyed
**> > > > when its count storage is equal to zero else its count storage is
**> > > > simply decremented.
**>
**> > > > What algorithm should be applied ? I want for instance to cleanup r1
**> > > > but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
**> > > > during the process and their storage count must be b3(1) b4(1)).
**>
**> > > > Notice that the deletion of of a tree must be possible even if the
**> > > > count storage of the root is not equal to zero :
**> > > > r1 -> b1 -> b2 -> r1 -> ...
**>
**> > > It all depends a bit on how large your typical graphs are, how long on
**> > > average the simple paths, what type of operations and queries you want
**> > > to do on it and how often. My first guess for the representation
**> > > would a simple straightforward adjacency list representation, (a
**> > > binary relation that contains all the edges) and if it's not too big
**> > > and your paths are often long it might be interesting to maintain an
**> > > extra table with the transitive closure of the graph.
**>
**> > The graph may be quite large. Number of vertices (nodes) : usually
**> > 100000, sometimes much more (a very big computation may lead to about
**> > 100 millions). This corresponds to a 3D meshing, each mesh (a
**> > particular node) containing information about chemical composition (a
**> > sub-node), temperature, fluid characteristics (another sub-node) ...
**>
**> > Let us precise that simple paths are always short when one excludes
**> > recursive points (a maximum of 10 nodes).
**>
**> > > If you go for the adjacency list approach, make sure that you do as
**> > > much as possible in one SQL statement when you start following the
**> > > edges. So look up all nodes that are reachable in one step in one
**> > > statement, update those, and store the ones that have to be deleted in
**> > > a temporary table. Then again with one statement look up those that
**> > > can be reached from in one step from the nodes in the temporary table.
**> > > Et cetera.
**>
**> > I don't use SQL but it does not matter :
**>
**> It might. Outside the context of an SQL database my advice might
**> actually be counterproductive.
**>
**> > I can build up easily the
**> > list of nodes related to a particular starting point (the node "env"
**> > in my example). I am also able to compute, for each node, its
**> > "external count" (0 for all the nodes except b3 which has the value
**> > 1).
**>
**> > After that I can start to destroy really : I only go down the nodes
**> > which have an external count equal to zero (this will protect b4 in
**> > the example). The problem is that the algorithm is not very efficient
**> > when the list of nodes is high, simply because I need, for each node
**> > having a storage count greater than 0, to look for that node in the
**> > list of all the nodes and to check the external count (CPU cost
**> > proportional to O(n2) if n is the number of concerned nodes).
**>
**> You might consider defining an index over the set of edges that allows
**> a quicker look-up on the basis of the begin node. Of course there is a
**> price to pay for that because updates get more expensive.
*

I don't see very well the goal of that index. Could you give me additional information please ?

*>
**> -- Jan Hidders
*

Received on Wed Jan 23 2008 - 15:34:19 CET