Re: A Topological Relational Algebra in Lisp

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Tue, 27 Jan 2015 17:50:45 -0800 (PST)
Message-ID: <f6881000-9735-4cc3-8852-566a4eac5314_at_googlegroups.com>


Norbert, Tegiri

Thanks, Tegiri, for the hint.

Ok, I went back and re-read the earlier posts.

Ok, so S is Sketch and D is Detail. Those "relations", that are /so/ different to you, (which caused my misunderstanding) are one and the same to me. Perhaps my mind is Normalised. Let me try and put it in your terms: there is a single relation variable over /all/ the "relations" you describe in your examples. When Normalising, we care only about base relations, not derived relations, in order to create, to make-concrete, the variable.

Since I realise that mathematicians cannot make the distinction between base relations and derived relations, I am not going to ask you to do that. Please, just realise, that that is an impediment in the progress of your project, from The Abstract to The Real. And to communications.

> On Sunday, 18 January 2015 10:07:12 UTC+11, Norbert_Paul wrote:
>
> ...
>
> However, when spatial entities that live in some R^n (R^2 - the plane, R^3 -
> the Euclidean space, R^4 - the space-time, ...) are modelled and
> (according to my assumption) are partitioned into a a finite set of spatial
> entities (say, faces, edges, volumes, hyper-volumes, ...) then the set of
> entities immediately fits into my model (for being finite). Interestingly the
> entities get their topology by the same operations that produce topological
> result spaces from input spaces (so-called !quotient space").

In case it is not clear, given your original request, for practitioners to respond, a relational database implementation, etc, the above paragraph is the place where we do "meet", where we start from.

It is disappointing that since databases are finite, they are uninteresting to you. Because that is meeting point is for me. I am happy to deal with your request, not being a mathematician, being a practitioner, and so on. Likewise, in view of your original request, you might have to entertain a non-mathematician.

I am not sure how you are going to implement "interesting" database schema, Relational ones, if you will not first implement "uninteresting" database schema, and ensure they are Relational. There is already a gap between what is in your mind, in abstract terms, and what is required to implement that ("there is a lot of work to be done"). And the exercise of closing that gap is proceeding at a snails pace. Because:

a. you have difficulty communicating with real-world people, in concrete terms
b. and conversely, you do not understand what they communicate to you
c. you will not accept a practitioner's knowledge of what is important, to an implementation, even though you are aware that you are not an implementer.


I understood the uncorrected versions of this, I think 95%, and accepted it (ie. it did not jar me with some intuitive error). But the corrected versions ...

> On Sunday, 18 January 2015 20:24:12 UTC+11, Norbert_Paul wrote:

> > A topology could be
> >
> > T = { {}, {bathroom}, {hallway}, X}.
> >
> This is wrong. The topology looks like
> T = { {}, {bathroom}, {hallway}, {bathroom, hallway}, X}.

Is the fourth element supposed to identify that there is a passage between {bathroom} and {hallway}, while the previous enumeration did not have that passage. And we know (from another row) that that passage is occupied by {bdoor} ?

In that case, I do not accept that {room_a, room_b} is a valid IDENTIFIER for passages. I suggest {room,passage}. Otherwise you encounter problems with the implementation of the key which is two elements from the previous level of the hierarchy {Room}, where all other keys in that table are (thus far) one element from the previous level of the hierarchy plus a designator, eg. {bdoor}.

> > This gives a topology
> >
> > { {}, {bdoor}, X}
> >
> > the so-called "dual topology".
> Also wrong. The dual topology looks like
> { {}, {bdoor}, {bdoor, bathroom}, {bdoor, hallway}, X }.

Ok, as long as, as a consequence, there are more rows in your previous enumeration R = bounded boundary.

Dual topology. If you mean the transposition, that is trivial. RT is a derived relation of R, not a base relation.

I have produced a second sketch of the data model in the distance. http://www.softwaregems.com.au/Documents/Student%20Resolutions/Norbert%20Paul/TDAS%20Note%20p4.pdf If you would be so kind as to ignore it again, and confirm the elements thus far, I can produce the next sketch, in my own plodding way. You need an Hierarchy for Articles (Architectural Spaces) in a physical location (Room).

In case you are unfamiliar with diagrams or standard-compliant data models, I have included a link to IDEF1X Notation.

If you have difficulty with diagrams, and you would prefer text, let me know, I will translate.


> On Monday, 19 January 2015 09:18:48 UTC+11, Tegiri Nenashi wrote:
>
> OK, this example illustrates what your topological objects are (modulo corrections in your followup post). The next question is how do you fit this object into database relation.

Please, let's not "fit it in", let's (a) understand it, (b) Normalise it, and then (c) implement it with some intelligence.

> OK, this is intuitive, but I would like to mention that you are still quite far away from equipping [database] relations with topology.

Agreed. That is precisely the gap that I was trying to close, coming from a completely different location to that of yours, Tegiri. Imagine that Norbert is lost in the Grand Canyon, and he started from a town on the West side. Tegiri is from a city in the West. I am starting from a town on the East side. We are both trying to find Norbert and lead him out.

So far, it appears I am a voice crying in the wilderness, and you (Norbert) have no idea that the Messias (ie. your stated goal, not me) is close at hand. That is alright, St John lost his head to a slut who couldn't bear to hear The Truth. I am in very good company.

I appreciate that you know your subject well, and you think your relations might be Relational, etc. Please note, you are quite loose with your identifiers. That hinders your project, when communicating with others, whose assistance you have requested.

> > The bathroom door example topology is derived from the metric space
> > that door and rooms live in.

Clarified yes, but the confusion about identifiers remains.

Instead of identify a passage as a connection between two rooms, ie. two keys, compounded, why don't you identify it as an item in metric space, ie. x, y, x co-ords from a nominal zero co-ord point, such as the north east corner of the floor.

Or identify the passage as belonging to (being part of) one room only.

> On Tuesday, 20 January 2015 06:58:42 UTC+11, Norbert_Paul wrote:
> > Tegiri Nenashi wrote:

> Just a the pair (X,T) is called "topological space", I call the pair (X,R)
> a topological data type, because I consider the relation R a binary relation
> R \subseteq X \times X
> on X where, by abuse of notion,

That will get you in the end. Here it interferes with the identification.

> R will only contain primary key attributes from
> X. Then R generates a topology T(R) for X this representing the tpological
> spcae
> (X,T(R)).
> I am, hence, thinking of registering such pairs in the database catalog as
> "spaces".

Yeah, sure, at the high level, with no rubber on the ground.

The issue is:

a.  Those Spaces are quite varied (room, door, passage, veneer)
b.  The way you have been identifying them is very loose (coords; names; composite names; etc)
c.  and you want to relate these Spaces to other Spaces of different types
d.  therefore you had better have an universal and meaningful identifier for "spaces"

> So I have both: a "multivariant" relation X accompanied by a binary relation
> R on X.

Tegiri
Can you please provide a one line definition of a multivariate relation. Google finds nothing in this space, only such that is related to multiple regression..

> In the ER-model X is usually called an "entity type" and R a
> "relationship type".
>
> I am only interested in topology and, hence a binary relation is sufficient.
> However I can imagine a binary relation with three attributes:
>
> create table PointSet
> planid integer not null,
> objectid integer not null,
> -- ... additional attributes ...
> primary key (planid, objectid)
> -- ... additional constraints ...

x ...
y ...
z ...


> );

What article is PointSet a point set of, ie. what are the various things that objectid will contain ? (I think you are going to answer, 50 or 60 "relations".)

> create table Topology(
> planid integer not null,
> bounded integer not null,
> boundary integer not null,
> primary key(planid, bounded, boundary),
> foreign key (planid, bounded) references point_set,
> foreign key (planid, boundary) references point_set);

I don't think it is a big deal at this point, but in my second sketch, thus far, my name for your PointSet is Article, and Topology is Space. I think Topology is a result set or derived relation, as such, there is no table for it (and if there were, it would be a massive duplication of data, which I will not participate in helping you to create).

I think the essence is, a bounded article is defined in terms of its boundaries.

If so, then:
a. how are unbounded articles defined ? Just the point set ? b. And are they included in some Topology ?

> Then the pair (PointSet, Topology) represents topological space

See what I mean by the naming. X and Y cannot represent Y. That is how they produced Mad Cow Disease.

> that is a
> so-called direct sum, a disjoint union of topological spaces each indexed by
> planid: http://en.wikipedia.org/wiki/Disjoint_union_(topology)
> The ineresting thing is, that the above construction would enforce the
> "dircet sum" property as a consistency rule.
>
> Also a ternary relation can be used to model a binary relation where each
> association has an integer attribute denoting the orientation of the boundary.
> But this is future work. I want also be able to model complexes.
> http://en.wikipedia.org/wiki/Chain_complex

Sure, and I am happy to support that, with these notes in mind

a. ternary relations (ie. for the identifier, as opposed to a third or fourth FK for some attribute) are a nightmare  (that might be a restatement of Tegiri's comment, if by "multi-variate", he means n-ary in the relational database world)
b. If you accept Codd, then n-ary relations can be reduced to binary relations.  So the solution is given but you must implement it
c. the difficulty is, again, your concept of how Spaces and PointSets are Identified.

Which brings up the next (after cleaning up the identifiers) important issue, that mathematicians (not you) are scared to death of. You need ____An Hierarchy____
with full navigation capability.

  • Room actually consists of door, window, wall
  • window consists of sill, frame, inner glaze, space, outer glaze
  • frame consists of ... etc
  • and each of those items is an Architectural Space. Which relates to other Spaces (bounded, boundary), etc.
  • and Topology is some combination of those Spaces, expressed as a /function/ of those Spaces.
  • in an hierarchy, obtaining the "direct sum" is a simple matter
  • your "consistency rule" is not an enforcement, it is a direct result of a few smaller consistency rules that are implemented in the structure. Ie. you are trying to enforce a rule on the result relation (instead of enforcing more appropriate rules on the base relations, which is the appropriate place), such that the derived relations (*all* of them) are consistent.
  • that is a classic error that mathematicians make

Which is why I am trying to get you to be clear about the relationships, so that the hierarchical ones (normal) are differentiated; what you require for boundary and bounded.

> >> The relational representaion of a topology is an old and well-established
> >> fact in mathematics. The innovation is (IMHO) to combine it with the
> >> relational model.
> >
> > Again, you are referring to the theory of binary relations
> > http://en.wikipedia.org/wiki/Relation_algebra which is different from
> > relational model used in database theory.
>
> Again, I am always talking of a /pair/ (X,R) of relations where X is arbitrary
> and R a binary relation on X (thereby, of course, omitting all non-key
> attribute values from X).
>
> If you register such space in a hypothetical catalogue of spaces, say by
> sort of
>
> create space OuterSpace(
> PointSet, -- the point set
> Topology, -- the relation
> (planid, bounded) -- the FK on the "bounded side".
> );

Ok, we are getting there.
- What is the key that you use to identify the space when you register it ? - Thus far, there is no PK (planid, bounded) anywhere for that space to be an FK reference /to/. Do you mean (planid, objectid) for a bounded row ?

See what I mean, the confusion of base relations vs derived relations causes manifold problems. That have to be resolved when an implementation is contemplated. Better to be clear about the distinction right from day one.

> I willl give an example query to illustrate the idea: When the pair
> (PointSet, Topology) is registered as a space you could query OuterSpace
> just like an ordinary relation:
>
> select *
> from OuterSpace
> where planid = 9;
>
> would give a pair (X9, R9) which would represent plan 9 from OuterSpace
> as a tpopological space (X9, T9) and which then would consist of the two
> relations
>
> X9 = select * from PointSet where planid = 9;
>
> R9 = select R.planid, R.bounded, R.boundary
> from cltopology R, X9 bded, X9 bdry
> where R.planid = bded.planid and R.bounded = bded.objectid
> and R.pplanid = bdry.pplanid and boundary = bry.objectid --.
>
> Here cltopology is the transitive closure on topology. R9 then generates the
> "subspace topology" of X9 in OuterSpace:
> http://en.wikipedia.org/wiki/Subspace_topology
> In the above example, the transitive closure is not necessary but in general,
> correct subspace topology computation requires transitive closure. Its
> efficient use requires a good query optimizer.

All commercial SQL platforms will do that quite easily.

We do not necessarily need the exact structure that you appear to be fixed upon. Any Normalised Relational structure, as long as it provides the hierarchy correctly, will do. You can whack a DISTINCT in the query, but it is better to have a constraint that prevents self-references.

> The entire Relational Algebra can be "topologized" this way. Every operator
> of Relational Algebra can be applied to such a space and will then return a
> correct result space. Some examples:
>
> group by: http://en.wikipedia.org/wiki/Quotient_space_(topology)
> cross join: http://en.wikipedia.org/wiki/Product_topology
> projection: http://en.wikipedia.org/wiki/Final_topology
> equijoin: http://en.wikipedia.org/wiki/Pullback_(category_theory)

Oh dear.

If that is all there is to it, then it is no big deal at all (I concede that it may be novel, and it may be a big deal, in the mathematicians circles.) We have been doing that in Relational Databases, for over thirty years that I am aware of. We have "geographised" spaces (think google maps plus customised objects); "geneologises" spaces (mum; dad; the kids; the grannies, for people; horses; dogs); manufacturing Bill of Materials spaces (assembly-components, construction; building materials). All with:

- full computation
- transitive closure
- instantaneous traversal
- unlimited levels
- etc, etc.  

The Unix inode file system is a straight hierarchy, with a true Relational implementation (one does not need an RDBMS to implement data, Relationally, on any platform). There are millions of Hierarchies. Unknown to mathematicians, of course.

I wrote my first hierarchical system in 1979, on a Network DBMS, and then years later, as a consultant for a vendor, advised the same team on how to convert it to Relational. I have written at least 40 hierarchical components in various databases over the years, and given full solution for others to implement in at least 50 more. I wrote one for Multiple Regression. Hierarchies are common in nature, and therefore almost every genuine Relational database will have some hierarchical capability implemented, if not the full hog, transitive closure, computation, etc.

There are Three Obstacles and One Consideration.

The first is, mathematicians are scared of The Hierarchy. And they are in a state of denial re the evidenced fact, that the Relational Model is a progression of, not a substitute of, the Hierarchical Model. Note the comments of J Hidders and J K Lowden in recent threads. Of the two Normal Forms defined in the RM (which btw remain undefined by mathematicians after 44 years), the first is, if we were to name it without psychological impediments, the ____ Hierarchical Normal Form____
- It destroys many of the problems that mathematicians, even today, are grappling with. - It deems many of the proposals of Date, Fagin, and Darwen *non-relational*, which is why they suppress the Hierarchical issue, in order that people won't identify their proposals as massive breaches of the RM.

The second obstacle is, due to the fear of the HM, and the suppression of Hierarchies, the recognition of an Hierarchy in the data (which is an ordinary, natural occurrence) is made difficult or impossible. As a result the method for both implementing an Hierarchy in a Relational Database (ie. in the /Data Space/, and for navigating it (the code, in the /Process Space/), are unknown.

This state of affairs results in many funny situations, all of them massively inefficient, with many more indices and rows and limitations (compared to the simple, direct implementation). They can't call it "hierarchy" because they deny it and fear it, so they have funny names for it: nested sets; adjacency lists. And they can only implement a fraction of it. We just call them Hierarchies. MS even has a HierarchyID datatype, which is the worst, because it fixes the hierarchy in concrete (any change to the hierarchy requires a large part of it to be re-written). In an unhindered implementation, one updates only the rows that one needs.

The need is so common, and the method is so pedestrian, that I no longer charge for the advice (concerning an implementation of an Hierarchy in a Relational Database), I give it away free of charge, and I have the tutorials with a full set of examples and code available online.

Third, the hierarchy must be implemented in the Data Space, not the Program Space. The traversal (navigation of levels in the hierarchy) is of course, implemented in the Program Space, not the Data Space. If you do not understand this /architectural principle/ (the same one that I have stated that the OO/UML types and mathematicians commonly break, when they try to control the data from the program space, and build monoliths to offer to the god of mathematics), _you will build a very poor system_, that needs a massive amount of maintenance.

Fourth, a consideration, not an obstacle. Traversal requires recursion. Which means, if you don't have a commercial SQL platform, that provides recursion /in the Program Space/, you are stuffed. Check Joe Celko's method of traversing an hierarchy, only a fraction of which is implemented, as an "adjacency list", if you want a really good laugh.

> It is a RoomsWithTopology object that is represented by a pair of relations
> just as described above. I am thinking of expanding the relational catalog to
> register topological spaces. Then a query on spaces always uses the topological
> counterparts of each relational operator involved. A mix of spatial and
> non-spatial operators combining "ordinary" database relations with "spaces" is
> also possible.

I do not see how the "space" relations are different from "ordinary" relations. They are all just relations (keeping in mind that we implement base relations as tables). If the relations contain "space" data, sure, they are space relations. If they contain "mummy and daddy" data, sure, they are geneological relations.

> ... these are the ones I already know. The list you gave me is handy. I could
> check for each property mentioned there if and how it affects the topology.
>
> Instead of restricting R, one could provide consistency rules for spaces to
> enforce topological properties of modelled spaces.

Exactly right. When you get close to realising your tables, there will be quite literally 40 or 50 constraints that you can implement on the database ... as long as you have: - correctly Normalised it first,
- and any hierarchies that do exist in the data, have been implemented directly (without the impediments detailed above). That is the mass of Integrity (all declarative) that a Relational Database is capable of, that a OO/UML program is not capable of. ANd that will "ensure" that any and all result sets (your "relations" are consistent; have transitive closure; allow full computation (including "direct sum"); etc. Attempts to enforce constraints on the result set is of course a gross error.

My last point regards relevance. Separate to the novelty of "implementing toplogical space <or whatever> relations, Relationally". Separate to overcoming the impediments and implementing an hierarchy as a Relational Hierarchy. At the outset, from your text descriptions, I had the idea that this was a substantial project; that there was a huge gap between the 40 or 50 mathematical relations, and a physical implementation. It is turning out, that it is:

- Four Relational tables. 
---- Pedestrian for a practitioner (as long as Normalisation is understood).  
- That are implemented as a Relational Hierarchy.  
---- Simple for me, difficult-to-impossible for those who are victims of the suppression of Hierarchies, as a result of Date's, Fagin's, Darwen's, Abiteboul-Hull-Vianu's "work", and as proliferated by virtually every mathematician, eg. as evidenced by J Hidders and J K Lowden.
- There are already hundreds of thousands of similar implementations in the real world.  Refer my first post in this thread.  Millions, if you count licences.

Therefore, it is a tiny project, if implemented by an experienced practitioner who is not crippled by the impediments. ---- And it is difficult, for anyone else, with the extent of difficulty directly related to the extent that they are affected by the impediments. ---- This is a case in point, of how mathematicians cripple mathematicians, all by themselves.

Therefore, what remains is, whether the relations are Relational.

- that is trivial to me, I can ensure that they are
---- Given the evidenced level of knowledge of the Relational Model amongst mathematicians, sure, that might (a) be a big deal, and (b) never complete (due to not knowing what Relational means in the physical universe)
- that might well be a big deal in the world of mathematical papers (refer my initial comments that,, for the life of me, I cannot understand why people even consider non-relational relations), and therefore, go ahead and ensure that your relations are Relational (considering my initial comments that it is not possible to determine that, until you have a model that is close to implement-able)

Therefore, I fail to see the relevance of the tiny project. ---- I concede, it may be relevant, novel, first time relations are Relational, "a lot of work to be done", etc ... for mathematicians.

Nevertheless, I am willing to execute the thankless task of assisting you in your progress to your stated goal of implementation. Ie. translate your mathematicians descriptions into implement-able form; Normalised to <all>NFs (ie. all current NFs and any NF that will be "defined" in the future); with an Hierarchy implemented directly and squarely (full "direct sum" and transitive closure); without redundancy; and produce a complete (ala complete documentation) data model for a Relational Database.

Cheers
Derek Received on Wed Jan 28 2015 - 02:50:45 CET

Original text of this message