# Re: Mixing OO and DB

Date: Wed, 12 Mar 2008 06:10:36 -0700 (PDT)

Message-ID: <d4967e43-4235-4864-91c4-2209333b2602_at_u10g2000prn.googlegroups.com>

On Mar 12, 2:01 am, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:

> >>That's fine. Consider, for example, an algorithm that finds the

*> >>minimum spanning distance of a graph. (e.g. cheapest network route, or
**> >>cheapest travel itinerary, etc). The node and edges of the graph are
**> >>stored in database tables.
**>
**> >>Shall we execute that algorithm by doing thousands of tiny queries as
**> >>we walk from node to node through the edges? Or shall we query all the
**> >>nodes and edges in one gulp, arrange them into a graph of objects, and
**> >>walk through them that way?
**>
**> It's interesting that the self-aggrandizing ignorant should mention
**> minimum spanning trees. Creating a generic procedure for calculating
**> minimum spanning trees in SqlServer is on my to-do list as I write.
**>
**> If one studies the algorithms for minimum spanning trees, one quickly
**> sees the task involves no traversals whatsoever. In fact, one generally
**> creates the MST as a precursor to some sort of traversal, and the
**> algorithms themselves are specified in terms of sets, which makes them
**> ideal for implementing relationally.
**>
**> At the moment, I lean toward Kruskal's algorithm. Mostly, I just don't
**> understand Chazelle's algorithm well enough, and I don't have the
**> patience to hunt down the remaining details. Plus, it is difficult to
**> tease out how Chazelle's approximations might interact with dynamic cost
**> functions.
**>
**> The algorithm relies on a disjoint-set structure, which is just a tuple
**> really. One would start by initializing a set of these structures with
**> an element for each vertex. Hmmmm... a set of tuples... hmmmm... I
**> wonder what sort of variable I could use to hold a set of tuples... I
**> think I will use a relvar.
**>
**> For a general solution we need to associate a cost with each edge.
**> Hmmmm... an edge and a cost... hmmmm... I think I will use a tuple for
**> each of those and a relvar for the entire set once again.
**>
**> For the result, we need a set of edges. I think I will use a relvar for
**> that too.
**>
**> Once we initialize these relvars, the algorithm is quite simple:
**>
**> While our edge-cost relation has at least one tuple:
**> Choose one of the edges with the minimum cost.
**> Perform a disjoint-set union on the two vertexes joined by the edge.
**> Insert the edge to the MST result relvar.
**> Delete all edges from the edge-cost relvar
**> where the disjoint-set find places them in the same set.
**> Done.
*

Presumably the "disjoint-set" relvar represents a partition of the set
of all vertices into disjoint sets and therefore corresponds to an
equivalence relation. I guess it can be represented using a relation
with two attributes - a partition identifier and a vertex
identifier. The partition identifier could in practise be any
representative vertex id from its equivalence class.

When I studied Kruskal's algorithm, there was a test for whether the edge with minimum cost connects distinct sets in the "disjoint-set" rather than explicitly deleting edges as per Bob. Presumably the test only requires a self join on the "disjoint-set" relvar to test whether two given vertices belong to the same equivalence class. This seems simpler and I would suggest is more likely to be faster in practise unless there are many redundant edges that need to be deleted or the performance is affected by client-server network latency etc.

Kruskal's algorithm is O(E log E) where E = number of edges in the graph. I am suspicious whether a compiler will be clever enough to avoid quadratic complexity with Bob's algorithm. Of course it will depend on the data structures and choice of implementation but it looks to me like the deletion code cannot easily be made super linear (ie better than linear) on the number of edges remaining in the edgecost relvar.

> The general solution on my to-do list is a little more complicated

*> because the cost function can be dynamic. There has to be another step
**> in the loop to re-evaluate the costs for any edges where the cost might
**> change.
*

Huh? Kruskal's algorithm is a greedy algorithm which depends on fixed edge costs while the algorithm proceeds in order to calculate an MST. Received on Wed Mar 12 2008 - 14:10:36 CET