Re: how to suppress carefully a recursive tree

From: Jan Hidders <hidders_at_gmail.com>
Date: Wed, 23 Jan 2008 04:40:00 -0800 (PST)
Message-ID: <59a4fe38-f0f4-4859-a29e-d49bbfdf1798_at_i12g2000prf.googlegroups.com>


On 22 jan, 17:10, fj <francois.j..._at_irsn.fr> wrote:
> On 22 jan, 15:52, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > On 22 jan, 12:04, fj <francois.j..._at_irsn.fr> wrote:
>
> > > I know how to suppress a normal tree but I meet the following kind of
> > > situation :
>
> > I'm guessing that when you say "suppress" you mean "represent in a
> > database". Correct?
>
> No : I want to destroy, remove, kill ...

Try "delete", that's a better translation of "supprimer". To suppress is more like "reprimer", "bannir" or "inhiber".

> a part of the data (a
> complete tree or just a branch), but without destroying data shared by
> other trees or branches.

Ok, understood.

> > > r1
> > >     -> b1
> > >         -> b2 -> ...
> > >         -> b3 -> ...
> > >     -> b2
> > >         -> b1 -> ...
> > >         -> b1 -> ...
> > >     -> b3
> > >         -> b4
>
> > > r2
> > >     -> b3 -> ...
>
> > So, a directed graph with multiple roots. Correct? So basically you
> > want to store arbitrary directed graphs.
>
> Yes and no. The two "roots" r1 r2  could perhaps belong to a bigger
> tree.
>
>
>
> > > A same node can be referenced at several places. Each node is
> > > associated to a storage count :
>
> > > r1(0)  b1(3)  b2(2)  b3(3)  b4(1)  r2(0)
>
> > Which would correspond to the number of incoming edges, yes?
>
> Yes
>
>
>
>
>
> > > In a normal tree without recursion (in the example above, recursion
> > > occurs because b1 contains b2 and vice versa), a node is destroyed
> > > when its count storage is equal to zero else its count storage is
> > > simply decremented.
>
> > > What algorithm should be applied ? I want for instance to cleanup r1
> > > but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
> > > during the process and their storage count must be b3(1) b4(1)).
>
> > > Notice that the deletion of of a tree must be possible even if the
> > > count storage of the root is not equal to zero :
> > >    r1 -> b1 -> b2 -> r1 -> ...
>
> > It all depends a bit on how large your typical graphs are, how long on
> > average the simple paths, what type of operations and queries you want
> > to do on it and how often.  My first guess for the representation
> > would a simple straightforward adjacency list representation, (a
> > binary relation that contains all the edges) and if it's not too big
> > and your paths are often long it might be interesting to maintain an
> > extra table with the transitive closure of the graph.
>
> The graph may be quite large. Number of vertices (nodes) : usually
> 100000, sometimes much more (a very big computation may lead to about
> 100 millions). This corresponds to a 3D meshing, each mesh (a
> particular node) containing information about chemical composition (a
> sub-node), temperature, fluid characteristics (another sub-node) ...
>
> Let us precise that simple paths are always short when one excludes
> recursive points (a maximum of 10 nodes).
>
> > If you go for the adjacency list approach, make sure that you do as
> > much as possible in one SQL statement when you start following the
> > edges. So look up all nodes that are reachable in one step in one
> > statement, update those, and store the ones that have to be deleted in
> > a temporary table. Then again with one statement look up those that
> > can be reached from in one step from the nodes in the temporary table.
> > Et cetera.
>
> I don't use SQL but it does not matter :

It might. Outside the context of an SQL database my advice might actually be counterproductive.

> I can build up easily the
> list of nodes related to a particular starting point (the node "env"
> in my example). I am also able to compute, for each node, its
> "external count" (0 for all the nodes except b3 which has the value
> 1).
>
> After that I can start to destroy really : I only go down the nodes
> which have an external count equal to zero (this will protect b4 in
> the example). The problem is that the algorithm is not very efficient
> when the list of nodes is high, simply because I need, for each node
> having a storage count greater than 0, to look for that node in the
> list of all the nodes and to check the external count (CPU cost
> proportional to O(n2) if n is the number of concerned nodes).

You might consider defining an index over the set of edges that allows a quicker look-up on the basis of the begin node. Of course there is a price to pay for that because updates get more expensive.

  • Jan Hidders
Received on Wed Jan 23 2008 - 13:40:00 CET

Original text of this message