Re: A Topological Relational Algebra in Lisp

From: Norbert_Paul <norbertpauls_spambin_at_yahoo.com>
Date: Thu, 22 Jan 2015 21:20:09 +0100
Message-ID: <m9rm0r$144$1_at_dont-email.me>


Tegiri Nenashi wrote:
> OK, now it is clear: the objects you are focusing on are pairs of
> multivariate relation X in the first component, and binary relation R in the
> second.
>
> Now you can define operations in Norbert algebra "component-wise". For
> example you can define "multiplication" as natural join in the first
> component and composition in the second. This would be not entirely without
> merit, because multiplication would be commutative and associative operation.
> However, natural join in [multivariate] relational algebra is idempotent,
> while composition in algebra of binary relations is not. Therefore, I'm not
> certain how well defined your algebra is. Can you please describe each and
> every of the following operations: natural join (as it might be something
> other than my guess), projection, and union?
OK, but first I will go into some mathematical principles:

Given two topological spaces (X,Tx) and (Y,Ty). A function f:X->Y from a set X to a set Y is called /continuous/, from (X,Tx) to (Y,Ty) if and only if ("iff") the original image

    f^(-1)(B) = { a in X | f(a) in B }

of every element B in Ty is an element in Tx. Note that the bigger ("finer") Tx (or the smaller ("coarser") Ty) the more likely is it that f is continuous.

Given two relational representations (X,R) and (Y,S) of topologiacal spaces (X,T(R)) and (Y,T(S)) then a function

   f : X -> Y

is continuous iff for every pair (a,b) in R the image pair (f(a),f(b)) is an element of S^*, the transitive and reflexive closure of S, hence, iff

   (a,b) in R => (f(a),f(b)) in S^*

holds.
For example, f may be a foreign key such that each tuple tx in X references a tuple f(tx) in Y.
Note that here the smaller R ("finer") or the bigger S ("coarser") the more likely is it that f is continuous.

Now when you have a query operation op(Y) on a database table Y may get a result table

   X = op(Y) with a function f : X -> Y

that maps each the query result tuple tx in X to its input tuple f(tx) in Y or you get a result table

   Z = op(Y) with a function g : Y -> Z

that maps each input tuple to a result tuple. In the first case I say, the query operator is of /initial/ type, because the result is at the starting side of the function arrow. The second case is, hence, of /final/ type.

When you have queries that involve more than one tables, say Y1, Y2, Y3, you may get

   X = op(Y1,Y2,Y3) with functions f1:X -> Y1, f2:X -> Y2, f3:X->Y3

or you get

   Z = op(Y1,Y2,Y3) with functions g1:Y1 -> Z, g2:Y2 -> Z, g3:Y3 -> Z

Now if each such query input table has an attached binary relation R1, R2, R3, the relation of the topology for the query result table should contain as much topological information from the input as possible but still all functions, f1,f2,f3 or g1,g2,g3 -- depending if your query operator is initial or final (or both) -- should stay continuous.

Trivially, the f1,f2,f3 are always continuous with the empty relation {} subset X x X and the g1,g2,g3 are always continuous with the universal relation X x X. So there always exists a relation that makes all functions involved continuous.

So we are either looking for a maximal relation Rx for X that makes all fs continuous or we are looking for a minimal relation Rz that makes all gs continuous.

This can be done component-wise, for each function: Let R be the relation associated with Y. Hence op queries the spaces (Y1,R1), (Y2,R2), (Y3,R3) to get (X,Rx) or else (X,Rz).

The final case gives:

   R1 = { (f1(a), f1(b)) | a R b }
   R2 = { (f2(a), f2(b)) | a R b }
   R3 = { (f3(a), f3(b)) | a R b }

Then a result relation is Rz = R1 u R u R3 (where "u" is set union). So this case is easy and can be efficiently computed.

The initial case gives:

   R1 = { (a,b) in X x X | g1(a) R+ g1(b)) }
   R2 = { (a,b) in X x X | g2(a) R+ g2(b)) }
   R3 = { (a,b) in X x X | g3(a) R+ g3(b)) }
Then a result relation is Rz = R1 /\ R /\ R3 (where /\ is set intersection). Here R+ is the transitive closure on R. So this case is more expensive and one should take more care in efficiency. Whereas transitive closure always gives a correct relation one may reduce the result relation to get Rx' = OPEN(Rz) by removing tupels thet do not affect the topology. For example

   Ri = { (a,b) in X x X | a != b, gi(a) R+ gi(b)) } is smaller but topologically equivalent to

   Ri = { (a,b) in X x X | gi(a) R+ gi(b)) }.

Every standard query operator from Relational Algebra falls into one of these two categories. The renaming query which just renames attributes and leaves the data unaffected even falls in both categories.

natural join:

assume two relations R(a,b,c) and S(c,d,e). I will not specify the key to simplify the presentation and, hence, take (a,b,c) and (c,d,e) , which are superkeys, as keys.

Then RS(a,b,c,d,e) is the schema of the natural join Each tuple (a,b,c,d,e) in RS has a projection

   pR(a,b,c,d,e) := (a,b,c) in R

and

   pS(a,b,c,d,e) := (c,d,e) in S

So we have two functions

  pR : RS -> R and pS : RS -> S

that map each result tuple back to its "origin" within the input relations. As the result RS is at the domain side of each of these functions we have an /initial/ query. So when the relations R ans S have binary relations T(ia,ib,ic, da,db,dc) and Q(ic,id,ie, dc,dd,de) associated such that (R,T) and (S.Q) represent topological spaces, then a straight-forward result relation TQ for the topology is

    TQ(ia,ib,ic,id,ie,da,db,dc,dd,de)

  • { (ia,ib,ic,id,ie,da,db,dc,dd,de) | (ia,ib,ic) T^+ (da,db,dc) andalso (ic,id,ie) Q^+ (dc,dd,de) }.

Then (RS,TQ) represents the fiber product space of the topological spaces represented by (R,T) and (S,Q). Of course the relatton TYQ may be "trimmed" by projecting only onto the key attributes of TQ and by removing "spurious" tuples (ix,dz) in TQ if the tuples (ix,y) and (y,dz) exist in TQ. (Codd called this operator "OPEN").

projection:

Projection is simple, as it is a final operator.

Take a relation X(a,b,c) and project on a,c, say p[a,c](x,y,z) = (x,z). Again, I take all attributes as (super)-keys. Then Y = p[a,c](X) is the query result table. So p[x,y] is the function that takes a query input tuple (x,y,z) to the query result (x,z).
When R represents a topolgoy for X, it defines which pairs (t1,t2) in X x X are "topologically" close. The query result query merely the image of R under p[a,c]:

If R(ia,ib,ic,da,db,dc) encodes the binary relation (ia,ib,uc) R (da,db,dc) then the prokction pi[ia,ic,da,dc](R) gives the query result topolgoy for the space (X,R).

union:

union is also a final operator.

Let (X,R) and (Y,S) be representations of topological spaces and XuY be the union of X and Y. Then we have the two inclusion functions

   ix : X -> XuY , iy : Y -> XuY ,

which are simply defined as ix(s) = s abd iy(t) = t. The result topology is represented by RuS. Hence

    (X,R) u (Y,S) = (XuR, RuS) .

If R and S are not union compatible they can easily made so.

Note that intersection is a special case of the natural join:

    (X,R) /\ (Y,S) = (X/\R, R^+ /\ S^+) .

Here, in general, transitive closure is ncessary.

I hope this gives an idea if the operations and their topological

ackground.

> This gives me a flavor how you define selection. However, selection can be
> considered as a special case of natural join. Here I see that my earlier
> guess about natural join defined as composition in the second component is
> simply wrong.

Selection is an initial operator:

Les s[p] be the selection operator with predicate p.

   s[p](Y) on a relation Y gives a subset X = s[p](Y)

If R is a relation for Y and (Y,R) represents a topological space, then the result topology T(R)|_{s[p](Y) (subspace topology) is represented by

   sR = { (a,b) in R+ /\ X^2 } .

This is the restriction of the transitive closure on R to the selected tuples. The selection operator s[p] (with predicate p) applied to a space gives

  s[p](Y,R) = (s[p](Y), sR) .

Here again, in general, transitive closure must be possibe to always give corerct query results. Otherwise the spaces involved must have a constant upper dimension limit (say, 42, which means that queries on 43-dimensional spaces may give wrong results).

> "group by" generalizes projection. Does Quotient Space generalize Final
> Topology?

I'd say it the other way: Final Topology generalizes all final operators: projection, quotient, union.

> Since you offered definition of topological relation as ordered pair of
> multivarite relation together with binary relation, you can specify
> operations in your algebra without referring to topology. Again, I'm
> interested to see formal definitions of natural join, projection, and union.

I hope you have seen them above ;).

> Again, can you please take the (Rooms,RoomNeighbourhood) pair from your
> example, then introduce one more pair and, next, work out the natural join of
> these two pairs (is it something meaningful)?

The natural join of two topological spaces to me is extremely interseting.

Take the example space S (the "sketch"):

   ----------|--------- < this is an "ascii-art" sketch    room1 door outside

As tables

    S = id


        room1
        door
        outside

   RS = iid      bdid
        --------------
        room1    door
        outside  door

such that (S,RS) represents a topological space. Attributes prefixed by "bd" denote elements that are in the topological closure of those having the attribute prefixed with "i". Hence each pair (iid, bdid) in RS specifies that

    bdid in cl({iid})
holds.

You may specify yet another space D, I call "details", such as in ascii-art:

   interior--> |1 <--innerDoorMaterial--> 2| <--interior

   interior--> |x <--entranceDoorMaterial--> || <--exterior

where the vertical bar denotes a database object that represents the material that may be used as surface towards an interior room. The innerDoorMaterial is bounded by two different surfaces which are also in the boundary of interior.
The arrows --> denote the relation

   a-->b :<=> b in cl({a}) .
Two vertical bars || denote a surface that must be used on teh exterior side of entrance doors which is exposed to weathering.

These are the tables:

D = id             comment
     ----------------------------------------------------
     interior       an interior room
     exterior       the exterior
     surface1|      Surface towards interior
     surface2|      Opposite surface towards interior
     surface||      Surface towards exterior
     surfacex|      Exterior door's surface towards interior
     idoormat       material of interior doors
     xdoormat       material for entry doors

RD =  ii       bdid
       ------------------
      interior  surface1|
      idoormat  surface2|
      interior  surfacex|
      exterior  surface||
      xdoormat  surface||

Now a third table may define aggregations I (the "index") among details D:

I = id      comment
     --------------------------
     int     an interior space
     idoor   an interior door
     xdoor   an entry door
     ext     the exterior

with relations

RI = iid bdid


      int  idoor
      int  xdoor
      ext  xdoor

Now the example space S may reference these detail aggregates (via a foreign key from S to I):

   object detail


   room1   -> int
   door    -> xdoor
   outside -> ext .

That mapping is continuous.
The following mapping however

   object detail


   room1   -> int
   door    -> idoor
   outside -> ext

is not continuous, because the relation RD that represents the topology of D contains the pair (outside, door). The foreign key then maps this pair to (ext, idoor) which is not in the transitive and reflexive closure of RA. Hence "continuity of foreign keys" could be a helpful instrument for CAD systems. Id does not allow to put an interor at the common boundary between interior and exterior.

Now when the details as a space (D,RD) map to their aggregates the following way (again, say, via a foreign key to I):

   id detail


   interior       int
   exterior       ext
   surface|       idoor
   surface||      xdoor
   idoormat       xdoor
   xdoormat       xdoor

then the natural join (S,RS) >< (D,RD) = (SD, RSD) gives:

    ------------->|<====================>||<-------------------- (ascii-"art")
    (room1)     (door)      (door)     (door)      (outside)
     (int)      (xdoor)     (xdoor)    (xdoor)       (ext)
   (interior) (xdsurf|) (xdoormat) (surface||) (exterior)

Whis is denoted with vertically stacked tuples. In tables this looks loke:

   SD  S.id     I.id   D.id
       -----------------------------
       room1    int    interior
       door     xdoor  xdsurf|
       door     xdoor  xdoormat
       door     xdoor  surface||
       outside  ext    exterior

The topology is then generated by

   RSD S.iid     I.iid   D.iid      S.bdid   I.bdid D.bdid
       -----------------------------------------------------
       room1    int    interior     door     xdoor  xdsurf|
       door     xdoor  xdoormat     door     xdoor  xdsurf|
       door     xdoor  xdoormat     door     xdoor  surface||
       outside  ext    exterior     door     xdoor  surface||

For example the tuple (door, xdoor, xdsurf|) is in the topological closure of the set {(room1, int, interior}).

The applicaton of the details (D,RD) to the sketch (S,RS) gives a detailled version of (S,RD). So natural join could be an interesting operator to organize levels of detail (LoD) in CAD system where the same detail is repeatedly used. This may be extended to a relational generalization of CAD-macro (or BLOCK) objects.

For the topologistst: the above natural join (SD,RSD) = (S,RS)><(D,RD) its the following pullback (S,RS) x_I (D,RD)

(SD,RSD) --------p_D-------> (D, RD)

     |                           |
     |                           |
     |                           |
    p_S                         FK_D
     |                           |
     |                           |
     V                           V

  (S, RS) -------FK_S-------> (I, RI)

with projections p_D and p_S and continuous foreign keys FK_D and FK_S

This "details library" concept still has some open questions because in some situations details can get lumped together too much. This problem has yet to be solved, but I consider it only a not too difficult database design problem for an extremely interesting practical application.

Hope that clarifies.

Norbert Received on Thu Jan 22 2015 - 21:20:09 CET

Original text of this message