Re: Using Materialized Path to create a paths table

From: <vadimtro_at_gmail.com>
Date: 25 May 2006 09:57:34 -0700
Message-ID: <1148576254.539979.296410_at_38g2000cwa.googlegroups.com>


Matthew Brealey wrote:
> I wonder if anyone has any thoughts on using the materialized path as
> an intermediate step to create a kind of paths table to represent a
> tree hierarchy

Transitive closure relation can be incremantally maintained for any graph, not just a tree. Here is an exerpt of ch.6 (no word from publisher when the book would be out, sorry).


We have already seen the power of incremental evaluation idea in the database implementation world. Indexes and materialized views are the most familiar incremental evaluation structures. Dong et al proved that transitive closure can be efficiently maintained via incremental evaluation system. In this section we explore Dong's approach, although with some deviation that would simplify the matter.

Let's revisit the transitive closure expression in terms of adjacency matrix

T = A + A^2 + A^3 + ....

>From mathematical perspective this is quite a handsome series. It could
be made even prettier if we consider the identity matrix - the matrix with ones on the main diagonal and zeros elsewhere. A common notation for the identity matrix is symbol 1. Perhaps the most important equality involving the identity matrix is

1 = A^0

There we see that the identity matrix literally begs to be added in front of the series for transitive closure

T = 1 + A + A^2 + A^3 + ....

Speaking graph language, what we did was including the loops at each node. Formally, T is now both reflexive and transitive closure.

Let see if can get any immediate benefits. Let's multiply T by 1 - A

T (1-A) = (1 + A + A^2 + A^3 + ....) - ( A + A^2 + A^3 + ....)

where all the (nonzero) powers of A at the right side cancel out

T (1-A) = 1

Multiplying both sides by the inverse of 1 - A gives an explicit formula for transitive closure

T = (1-A)^(-1)

Therefore, we might be able to calculate transitive closure (of directed acyclic graphs, at least), if we know how to invert matrices in SQL! Unfortunately, inverting matrices in SQL is difficult. The matrix approach, however, still shows its practical merit in the scope of incremental evaluation system.

In the incremental evaluation method we store the original graph together with another graph -- its transitive closure. Every time the original graph is updated (that is new edge is inserted or deleted) the transitive closure graph has to be changed as well, so that the information in both structures remains consistent. The problem of transitive closure graph maintenance essentially is finding an efficient SQL expression for the insertions and deletions in the transitive closure graph.

.........

Figure 6.10: Transitive closure graph maintenance problem: Inserting the edge (2,3) into the graph at the top left, is expected to update the transitive closure structure at the top right accordingly.

Let's continue pursuing adjacency matrix approach. Inserting an edge (x,y) into the original graph produces a new graph with the adjacency matrix

A+S

that is the original matrix A incremented by S, the latter being a matrix with a single nonzero entry in the column x, row y. How would this increment of adjacency matrix affect transitive closure matrix? Well, let's calculate (footnote: be careful about matrix multiplication being noncommutative). The new transitive closure matrix is

1+A+S+(A+S)^2+(A+S)^3+...

Expanding powers into polynomials we get

1+A+S+(A^2+AS+SA+S^2)+(A^3+A^2 S + ... ) ...

Let's rearrange terms more suggestively

1+A+A^2+A^3 + (S+AS+SA+A^2 S+...) + (S^2+A S^2+ ... )

The first series is the familiar transitive closure matrix T of the original (not incremented) adjacency matrix A. The second series could be factored into

(1+A+A^2 +...) S (1+A+A^2 +...)

which reduces to T S T. The last series vanishes, if we limit the scope to directed acyclic graphs. Indeed, each term in that series is multiplied by S at least twice. In other words, each term in the series corresponds to a path in the graph that goes through the edge (x,y) twice, which implies a cycle.

Summarizing, the new transitive closure matrix reduces to

T + T S T

  • soap box --------------------- Incremental Maintenance of Graph Matrices When original adjacency matrix A is incremented by S, the transitive closure matrix T is incremented by T S T.

This is a very succinct result. Let's double check by comparing it with the query computing the transitive closure increment in Dong's paper:
SELECT *
FROM (
      SELECT VALUES (a, b)
    UNION

      SELECT Start = TC.Start, End = b
      FROM TC
      WHERE TC.End = a
    UNION
      SELECT Start = a, End = TC.End
      FROM TC
      WHERE b = TC.Start
    UNION
      SELECT Start = TC1.Start, End = TC2.End
      FROM TC AS TC1, TC AS TC2
      WHERE TC1.End = a AND TC2.Start = b
) AS T;

SELECT *
FROM TC­NEW AS T
WHERE NOT EXISTS (SELECT *

                  FROM TC
                  WHERE TC.Start=T.Start AND TC.End=T.End)
INTO TEMP DELTA; INSERT INTO TC
SELECT *
FROM DELTA; The first part of the query, which is a union of 4 blocks, is supposed to correspond to our transitive closure matrix increment T S T. It appears that they are very dissimilar. How this can be?

Remember, however, that we conveniently decided to operate reflexive transitive closure matrices. If go back and represent T as 1+T', where T' is not reflexive anymore, then the increment matrix has to be written as (1+T') S (1+T') which can be expanded into

.....

with 4 terms as in Dong's method.

Now that we are confident with matrix solution, let's finalize it in terms of SQL. We have already seen that operating matrices of integers that count paths in the (directed acyclic) graph offers an exceptional clarity. Therefore, let's represent transitive closure relation directly after transitive closure matrix that we have investigated early

table TRC (     -- transitive, reflexive closure
   i   integer, -- tail
   j   integer, -- head
   val integer  -- weight

)

Once again, it's the reflexive property and additional information in the val column that distinguishes our method formally from Dong's.

There is an ambiguity. What about zero matrix entries? We have an option either to store it as (i,j,0), or ignore such rows. The first option simplifies SQL that maintains table TRC. The second option provides natural means for compressing sparse matrices.

  • soap box --------------------- Sparse Matrices Sparse matrices save space. Instead of storing full matrix i j val - - --- 1 1 0 1 2 0 1 3 1 2 1 0 2 2 0 2 3 0 3 1 0 3 2 5 3 3 0 it is more economical to omit zero entries i j val - - --- 1 3 1 3 2 5 Matrix addition query requires a little more care for sparse matrices. Matrix multiplication, however, is an aggregate. It produces correct result set either way.

Now all the ground work for transitive closure maintenance in SQL is complete. Inserting an edge (x,y) into adjacency graph has to trigger a conforming change in the transitive closure table TRC. The values in the TRC.val column should be incremented accordingly by the entries of the product of three matrices T S T.

We have already discussed how to write a product of two arbitrary matrices in SQL. A product of three matrices -- A B C -- can be written as a composition of two binary product operations (A B) C. Alternatively, we sum matrix elements at once

which is easy to translate into SQL
select A.i AS i, C.j AS j, sum(A.val*B.val*C.val) AS val from A, B, C
where A.j=B.i and B.j=C.i
group by A.i, C.j
Please note how naturally matrix associativity property goes along with relational join associativity.

Actually, we are after a simpler matrix product -- T S T. There no challenge adapting general case to our needs select t1.i AS i, t2.j AS j, sum(t1.val*t2.val) AS val from TRC t1, TRC t2
where t1.j = :x and t2.i = :y
group by t1.i, t2.j

If we have chosen the option of storing zero matrix entries, then the above query fits naturally into an update statement to the TRC table update TRC
set val = (

   select val + sum(t1.val*t2.val)
   from TRC t1, TRC t2
   where t1.j = :x and t2.i = :y
   and t1.i = trc.i, t2.j = trc.j
   group by t1.i, t2.j
)

It is fascinating that the transitive closure table maintenance is solvable with a single update, but this answer is unrealistic for two reasons. First, the TRC table has to grow at some moment, and this issue is left out of scope. Second, from performance perspective updating (or trying to update) all the rows in the TRC table smells a disaster. We'd better have a good idea which rows require update, and formalize it within the where clause (which is entirely missing).

Therefore, let's materialize updates to the TRC table in a designated table TRCDelta
insert into TRCDelta
select t1.i AS i, t2.j AS j, sum(t1.val*t2.val) AS val from TRC t1, TRC t2
where t1.j = :x and t2.i = :y
group by t1.i, t2.j

As we want to keep this table small, we can't afford to store matrix entries with zero values any longer. On the other hand, by just storing a sparse matrix stored in the TRC table, we automatically have the TRCDelta calculated as a sparse matrix either.

It is the TRC table update step that has to be adjusted. There are two cases:
1. All the rows in TRCDelta that don't match TRC rows have to be inserted
insert into TRC
select * from TRCDelta
where (i,j) not in (select i,j from TRC) 2. The rows that match have to increment their values update TRC
set val = val + (select val from TRCDelta td

                 where td.i = trc.i and td.j = trc.j)
where (i,j) in (select i,j from TRCDelta)

This completes our program in the case when an edge is inserted. What about deletion? From matrix perspective the cases of inserting an edge and deleting it are symmetric. Following our earlier development we could use the same matrix S that corresponds to a single edge (x,y), except that we have to subtract it everywhere. The final result in matrix form is transitive closure matrix T decremented by T S T.

In the first naïve update solution for the TRC table all that we have to do is reversing the sign
update TRC
set val = (

   select val - sum(t1.val*t2.val)
   from TRC t1, TRC t2
   where t1.j = :x and t2.i = :y
   and t1.i = trc.i, t2.j = trc.j
   group by t1.i, t2.j
)

Please note, that this symmetry is made possible because of the extra information that we store in the TRC table. Since we know how many paths are going from node i to node j, we can just subtract all the paths that are affected by the deletion of an edge (x,y). In Dong's approach maintenance under deletion is more complicated.

Carrying over the solution to sparse matrices requires little insight. The TRCDelta table that stores the T S T matrix is calculated the same way as in the edge insertion scenario. Then, subtracting the T S T from T brings up the two possibilities:
1. An entry in the transitive closure matrix T is the same as corresponding entry in the T S T.
2. An entry in the transitive closure matrix T is bigger than corresponding entry in the T S T.
In the first case the entry in the difference matrix T - T S T is 0. Therefore, all these entries have to be deleted delete from TRC
where (i,j, val) in (select i,j, val from TRCDelta) All the other entries have to be adjusted update TRC
set val = val - (select val from TRCDelta td

                 where td.i = trc.i and td.j = trc.j)
where (i,j) in (select i,j from TRCDelta)

Now that we have several methods for transitive closure calculation/ maintenance, let's again return to applications. Perhaps the most significant problem that can be expressed in terms of transitive closure is aggregation on graphs. Received on Thu May 25 2006 - 18:57:34 CEST

Original text of this message