Re: Mixing OO and DB
From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Wed, 14 May 2008 23:51:45 -0300
Message-ID: <482ba547$0$4046$9a566e8b_at_news.aliant.net>
>
>
> It's interesting that the self-aggrandizing ignorant should mention
> minimum spanning trees. Creating a generic procedure for calculating
> minimum spanning trees in SqlServer is on my to-do list as I write.
>
> If one studies the algorithms for minimum spanning trees, one quickly
> sees the task involves no traversals whatsoever. In fact, one generally
> creates the MST as a precursor to some sort of traversal, and the
> algorithms themselves are specified in terms of sets, which makes them
> ideal for implementing relationally.
>
> At the moment, I lean toward Kruskal's algorithm. Mostly, I just don't
> understand Chazelle's algorithm well enough, and I don't have the
> patience to hunt down the remaining details. Plus, it is difficult to
> tease out how Chazelle's approximations might interact with dynamic cost
> functions.
>
> The algorithm relies on a disjoint-set structure, which is just a tuple
> really. One would start by initializing a set of these structures with
> an element for each vertex. Hmmmm... a set of tuples... hmmmm... I
> wonder what sort of variable I could use to hold a set of tuples... I
> think I will use a relvar.
>
> For a general solution we need to associate a cost with each edge.
> Hmmmm... an edge and a cost... hmmmm... I think I will use a tuple for
> each of those and a relvar for the entire set once again.
>
> For the result, we need a set of edges. I think I will use a relvar for
> that too.
>
> Once we initialize these relvars, the algorithm is quite simple:
>
> While our edge-cost relation has at least one tuple:
> Choose one of the edges with the minimum cost.
> Perform a disjoint-set union on the two vertexes joined by the edge.
> Insert the edge to the MST result relvar.
> Delete all edges from the edge-cost relvar
> where the disjoint-set find places them in the same set.
> Done.
>
> The general solution on my to-do list is a little more complicated
> because the cost function can be dynamic. There has to be another step
> in the loop to re-evaluate the costs for any edges where the cost might
> change.
>
> [topmind's somewhat less informed response snipped]
Date: Wed, 14 May 2008 23:51:45 -0300
Message-ID: <482ba547$0$4046$9a566e8b_at_news.aliant.net>
Bob Badour wrote:
> topmind wrote:
>
>> Robert Martin wrote: >> >>> On 2008-03-09 01:02:47 -0600, Marshall <marshall.spight_at_gmail.com> said: >>> >>> >>>> On Mar 8, 6:07 pm, Robert Martin <uncle..._at_objectmentor.com> wrote: >>>> >>>>> On 2008-03-06 15:37:56 -0600, topmind <topm..._at_technologist.com> said: >>>>> >>>>> >>>>>>> Each small group of classes becomes a little roll-your-own data >>>>>>> access >>>>>>> and manipulation scheme that is perfectly tuned for it's very >>>>>>> specific >>>>>>> purpose. >>>>> >>>>> >>>>>> Which is over-kill for the task-level. >>>>> >>>>> >>>>> Do you have proof that it's overkill? Do you have any objective >>>>> measurements that it's overkill? Or it is just your own opinion. I >>>>> mean, if it works for you that's great, but don't force your own >>>>> opinions on everyone else <grin> >>>> >>>> >>>> This is a fallacious argument. You're proposing extra effort without >>>> justification. The idea that, in the absence of evidence either way, >>>> topmind's proposal of not putting in that effort is on equal footing >>>> with yours doesn't hold. Extra effort requires justification. What >>>> you are saying is, "hey, we don't know if this work has any value >>>> or not, so doing it is just as justified as not doing it." >>> >>> >>> Go back to the root of the argument. You'll see that the initial >>> premise is that the programmer organizes the data into a form that is >>> more convenient for him to get his computational job done. So there >>> *is* justification. >> >> >> Depending on how "convenient" is measured. >> >> Note that the effort to wrap SQL in methods is only one of the issues >> against it. >> >> >>>>> It is very common for programmers to manipulate data into forms that >>>>> are particularly convenient for the application they are writing. >>>>> Databases are seldom in that form since (for one thing) they must >>>>> usually serve many different and competing applications. >>>> >>>> >>>> (I'm going to just label the above as bogus without justification. >>>> It's late and I'm lazy.) >>> >>> >>> That's fine. Consider, for example, an algorithm that finds the >>> minimum spanning distance of a graph. (e.g. cheapest network route, or >>> cheapest travel itinerary, etc). The node and edges of the graph are >>> stored in database tables. >>> >>> Shall we execute that algorithm by doing thousands of tiny queries as >>> we walk from node to node through the edges? Or shall we query all the >>> nodes and edges in one gulp, arrange them into a graph of objects, and >>> walk through them that way?
>
>
> It's interesting that the self-aggrandizing ignorant should mention
> minimum spanning trees. Creating a generic procedure for calculating
> minimum spanning trees in SqlServer is on my to-do list as I write.
>
> If one studies the algorithms for minimum spanning trees, one quickly
> sees the task involves no traversals whatsoever. In fact, one generally
> creates the MST as a precursor to some sort of traversal, and the
> algorithms themselves are specified in terms of sets, which makes them
> ideal for implementing relationally.
>
> At the moment, I lean toward Kruskal's algorithm. Mostly, I just don't
> understand Chazelle's algorithm well enough, and I don't have the
> patience to hunt down the remaining details. Plus, it is difficult to
> tease out how Chazelle's approximations might interact with dynamic cost
> functions.
>
> The algorithm relies on a disjoint-set structure, which is just a tuple
> really. One would start by initializing a set of these structures with
> an element for each vertex. Hmmmm... a set of tuples... hmmmm... I
> wonder what sort of variable I could use to hold a set of tuples... I
> think I will use a relvar.
>
> For a general solution we need to associate a cost with each edge.
> Hmmmm... an edge and a cost... hmmmm... I think I will use a tuple for
> each of those and a relvar for the entire set once again.
>
> For the result, we need a set of edges. I think I will use a relvar for
> that too.
>
> Once we initialize these relvars, the algorithm is quite simple:
>
> While our edge-cost relation has at least one tuple:
> Choose one of the edges with the minimum cost.
> Perform a disjoint-set union on the two vertexes joined by the edge.
> Insert the edge to the MST result relvar.
> Delete all edges from the edge-cost relvar
> where the disjoint-set find places them in the same set.
> Done.
>
> The general solution on my to-do list is a little more complicated
> because the cost function can be dynamic. There has to be another step
> in the loop to re-evaluate the costs for any edges where the cost might
> change.
>
> [topmind's somewhat less informed response snipped]
The Delaunay Triangulation code was more complex (proper domain support, which SqlServer lacks, would have helped), but the Voronoi Tessellation code could be as easy as a view. Lloyd's algorithm for k-means clustering was rather straightforward in t-sql once I had the delaunay and voronoi code in place.
My to-do list is now void of computational geometry stuff for the time being. Sigh. Received on Thu May 15 2008 - 04:51:45 CEST