Re: Can this be done in SQL? Find the transitive relation

From: Vadim Tropashko <vadimtro_invalid_at_yahoo.com>
Date: Fri, 10 Aug 2007 15:18:32 -0700
Message-ID: <1186784312.853371.292850@e9g2000prf.googlegroups.com>

On Aug 4, 12:06 pm, "da. Ram" <dba2..._at_yahoo.com> wrote:
> Dear Group,
>
> I am strugling get the following done in SQL. Could anyone help me to
> figure out a way?
>
> Oracle version 8i .
>
> create table t_grp (id1 char(1), id2 char(2)) ;
>
> insert into t_grp values ('A','C'); --> A:C are related
> insert into t_grp values ('B','D'); --> B:D are related
> insert into t_grp values ('F','H');
> insert into t_grp values ('G','H');
> insert into t_grp values ('B','G');
> insert into t_grp values ('X','Y');
> insert into t_grp values ('W','Y');
>
> I want to group the values based on the following rules,
>
> 1, if any of the value from id1 or id2 can be linked to any other
> values,
> they all will be treated as one group
>
> Ex, in the above case
> A has relationship to C, neither of them has any other relationship to
> any
> other values in the table
> B has relationship to D, B has relationship to G, G has relationship
> to H, H
> has relationship to F
> F has relationship to H, H has relationship to G ...
>
> A:C - Group 1 (A,C) - new elements
> B:D -> B:G -> G:H -> H:F - Group 2 (B, D, G, H, F)
> F:H -> H:G -> G:B -> B:D - Group 2 (F, H, G, B, D) - same elements as
> above
> G:H -> G:B -> H:F -> B:D - Group 2 (G, H, B, F, D) - do -
> G:B -> G:H -> H:F -> B:D - Group 2 (G, B, H, F, D) - do -
> X:Y -> Y:W - Group 3 (X, Y, W) - new set of elements
>
> My objective is to get the final result in the below form.
>
> grp id1 id2
> ----- ------ ------
> 1 A C
> 2 B D
> 2 F H
> 2 G H
> 2 B G
> 3 X Y
> 3 W Y
>
> The following output is also fine [This would be the final output, but
> I can covert from the above to this one]
>
> grp id
> ---- -------
> 1 A
> 1 C
> 2 B
> 2 D
> 2 F
> 2 H
> 2 G
> ...
> 3 X
> 3 Y
> 3 W
> 3 Y
>
> Thank you.

http://www.bookpool.com/sm/0977671542, section on transitive closure

Transitive Closure

There are several key graph concepts that would guide your intuition when writing queries on graphs.
1. Reflexive closure of a graph is build by adding missing loops - edges with the same endpoints. Reflexive closure is expressed easily in SQL:

select head, tail from Edges -- original relation union
select head, head from Edges -- extra loops union
select tail, tail from Edges -- more loops

2. Symmetric closure of a (directed) graph is build by adding inversely oriented edge for each edge in the original graph. In SQL:

select head, tail from Edges -- original relation union
select tail, head from Edges -- inverse arrows

3. Transitive closure of a (directed) graph is generated by connecting edges into paths and creating new edge with the tail being the beginning of the path, and the head being the end. Unlike the previous two cases, transitive closure can't be expressed with bare SQL essentials - the select, project, and join relational algebra operators.

For now, let's assume that we know how to query transitive closure, and demonstrate how armed with these definitions we can approach some problems. Here is a puzzle from the comp.databases.oracle.misc forum: Suppose I have a table that contains Account# & RelatedAccount# (among
other things).
How could I use CONNECT BY & START WITH in a query to count relationships or families.

For example, in
ACCT REL_ACCT

Bob     Mary
Bob     Jane

Jane Bob
Larry Moe
Curly Larry
there are 2 relationship sets (Bob,Mary,Jane & Larry,Moe,Curly). If I
added
Curly Jane

then there'd be only 1 larger family. Can I use CONNECT BY & START with to detect such relationships and count them? In my case I'd be willing to go any number of levels deep in the recursion. Let's ignore oracle specific SQL keywords as inessential to the problem at hand. With proper graph terminology the question can be formulated in just one line:
"Find a number of connected components in a graph." Connected component of a graph is a set of nodes reachable from each other. A node is reachable from another node if there is an undirected path between them.

Figure 6.4: A graph with two connected components. Reachability is an equivalence relation: it's reflective, symmetric, and transitive. Given a graph, we formally obtain reachability relation by closing the Edges relation to become reflective, symmetric and, transitive (fig. 6.5).

Figure 6.5: Reachability as an equivalence relation: graph from fig. 6.4 symmetrically and transitively closed. Returning back to the problem of finding the number of connected components, let's assume that we already calculated the reachability relation EquivalentNodess somehow. Then, we just select a smallest node from each component. Informally,

Select node(s) such that there is no node with smaller label reachable from it. Count them.

Formally:
select count(distinct tail) from EquivalentNodes e where not exists (

select * from EquivalentNodes ee
where ee.head<e.tail and e.tail=ee.tail );

soap box ---------------------------- Equivalence Relation and Group By Why group by operator is so powerful? It is not among the fundamental relational algebra operators, and yet every other SQL query requires it. Partial answer is that group by embodies an equivalence relation. Indeed, it partitions rows into equivalence classes, and calculates aggregate values in each equivalence class. Unlike our last problem, however, group by leverages the existing equality operator on a domain.

Being able to query the number of connected components earned us an unexpected bonus: we can redefine a connected graph as a graph that has a single connected component. Next, a connected graph with N nodes and N-1 edges must be a tree. Thus, counting nodes and edges together with transitive closure is another opportunity to enforce tree constraint.

Now that we established some important graph closure properties, we can move on to transitive closure implementations. Unfortunately, our story has to branch here, since database vendors approached hierarchical query differently. Received on Fri Aug 10 2007 - 17:18:32 CDT