Re: A Topological Relational Algebra in Lisp

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Tue, 20 Jan 2015 01:36:45 -0800 (PST)
Message-ID: <2b443127-499b-408f-8dfd-aa09a97c4664_at_googlegroups.com>


Norbert

Please forgive the length; my comments are terse (feel free to ask questions about anything that is not completely clear). Hopefully you will find them to be a value. Take your time responding.

> On Saturday, January 17, 2015 at 11:38:53 PM UTC+11, Norbert_Paul wrote:


  • A & B --

Relevance & Value


It is absolutely valuable. Of course, for GIS and CAD, but nowadays, for much more.

  1. Years ago I wrote a app+Rdb for the Dept of Mineral & Energy (mining). They had a huge GEAC GIS system, proprietary with an SQL database. My system was quite separate, for Registration & Correspondence, it kept all the legal docs and correspondence together, on a registration basis (read lease-holder), whereas the GIS system kept every claim; every mining shaft; every hydrant; every nook and cranny, together. Of course the GIS system was full 3D, and printed something like the architectural plan for a house. On the registry side, some of the legal docs, when they are printed, need a proper diagram: not a full 3D, but a simple 2D-looking-like-3D, or 3D simplified. The same as your "flattening" function.

So I wrote a simple transport to query the GSI system and obtain the GIS data for a claim or whatever, and to construct a simple diagram. Only reqd for some printouts. I used awk for the ETL and computation, and gnuplot for the erection of the graphics. I didn't keep any GIS data on my side, in order to avoid duplication. But there was a rather large study of doing so, before I convinced them that they shouldn't.

Point being, if I had your tables and queries, I would have used them.

2. There are thousands of GIS & CAD type systems out there. They need enhancement, extension, additional capabilities that have been contemplated but not implemented, that needs to be written.

3. More than the GIS & CAD systems, nowadays we need to erect and display 3D objects, from virtually any system. Right now, I am working on a model (I can't discuss its specifics due to confidentiality) that is:
- to get an idea of the type of drawing, the structure of the objects: imagine eg. a typical standard-complaint IDEF1X model of a Rdb, with all the objects hanging in 3D space, and each object is full 3D (shape; volume; colour; shadow; etc)

  • that has many layers, which have to be drawn and manipulated separately (separate parties, separate departments)
  • all dragged together, into one layered corporate-level diagram, so that everyone can henceforth use one diagram, and conflicts can be eliminated
  • all the issues of different people drawing objects differently have to be resolved
  • I just demand vector graphics, with the dimensions stated, and some rules observed (relate to zero axis, etc) Now I am doing this in a very good drawing tool, but not a CAD tool (on purpose). If I had something like your tables & queries, it would save me a lot of time.

There are millions of needs like this.

4. When I was in Germany last, I visited the factory of a company that builds 100% wood houses, using 50 or 60 different types of wood. You design the house on a computer; they make up all the components in a couple of weeks; then they go to the site with a few trucks and equipment and build the entire house in one week. They had it the other way around: an Rdb, which contained all the drawing elements (not the drawings) and inventory (components of a house). They then exported that data to a CAD system, where the drawing itself took place. The CAD system was not very good at topology. Ie. they had respect for the Rdb, and wouldn't dream of placing their company data on something that was not Relational. If they had your tables & queries, they could do it all from one system.

Topology in Relational Form


I find it amazing that mathematical types have a well-established topology set ... that is not Relational. What on earth is the point of defining anything, in terms that are not Relational. I mean, it is 2015, and Codd wrote his paper in 1970. Congratulations on developing a fully Relational set of topological relations.

Next Steps


I would advise you to find a suitable company that has a need, such as the examples above, and team up with them. They will most probably fund your research. But then you would have to use industrial-strength tools, platforms, languages. You would have to consider:
- their data (inventory of objects) already exists in a Relational DB, usually SQL

  • for a first cut, translate that data into your topology, then query (erect diagram)
  • for the long term, change their database such that it stores your topology in relations that can be used to query directly

So the real goal is, to produce a paper that defines your topology in terms of relations, with a fully detailed data model, such that one can understand it, and implement one's objects in a relational topology, such that one can use your queries to erect the diagrams.

Method


So the result would be a set of routines in some callable language, delivered as library, plus a set of adequate documentation. Then anyone can link it in and use it, in any app+Rdb (3rd party product or in-house).

Therefore, LISP is fine for now ("fun" is a very good reason at this stage), but you must keep your eye on the long-term goal of a packaged library, which you work up to, over time.

Most important, stay away from the notion of doing everything in one monolithic program. As a stand-alone program, the capability is irrelevant. This is a common problem amongst academics.
- SQL is the data sub-language for accessing an RDb, which is deployed in an RDBMSS

  • it is not a language (do not expect the capabilities of a language)
  • trying to do everything in SQL, or everything in (C, Java, a "d", etc) is stupid. It is 2015, we use various components and tie them together to construct a solution to a problem, not one monolithic program that "does everything", which is severely limited to "everything" that it can do.
  • eg. the notion of a "d" might be sound, for drawing in the sand, whilst contemplating some mathematical definition or other. The sound-ness ends there. Eg. only a tiny fraction of the RDBMS users can use it, because only that same tiny fraction care about writing database access (definiton as well as manipulation0 in terms of mathematical definitions of "relations". Eg. note the gross architectural limits of the monolith.

Right now, any of my product deliveries consist of:
- Sybase: Client, plus Server; plus Unix utilities

  • a complete, stand-alone RDb, which can be added to without limitation [*1]
  • a full set of Transactions (the ACID-compliant, high concurrency variety, not the imbecilic self-locking MVCC variety)
  • a full set of reports in whatever s/w the customer has, for report generation (because every report is a single SELECT command, because the Rdb is truly Relational). On top of which, they are free to construct any further reports. No massive report tool such as BusinessObjects is required (such are only required in order to provide reporting [Relational] access to Record Filing Systems)
  • all maintenance & housekeeping tasks, as shell scripts [that call awk,] that call SQL, callable from a GUI menu. This includes everything from nightly jobs that maintain the Rdb, as well as an array of back-end monitoring.
  • all ETL, both import and export, in awk, which first loads a set of awk-arrays with data from the Rdb, and then does its work (perl is also quite valid here)
  • the client-side program(s) can be built in any language: Java; C or any derivative.
  • which may call gnuplot to erect graphs, etc.
  • it is strange that many people use .NET, etc as a *development* language. I and many of my colleagues use it strictly as a delivery platform, the idea of coding in it is insane, much like coding in a 2GL, Assembler, etc. No. We code in 4GLs, such as PowerBuilder (there are many similar), and *deploy* on .NET or Java or Web, etc, by simply pressing a button. I am trying to give you an idea of the componentry, which is normal, demanded, since 1990, for a single product delivery. Note that each component can be expanded and extended, and it can be deployed according to the specific requirements of the site (security model; actual s/w package used; etc).

*1 if you had your topology delivered as a set of tables plus a library, they would fit in here. That would allow me to erect topological diagrams directly from the client-dside program. Sure, it would have its own set of topological queries, which eventually have to call SQL. In the same way that GUI client-side programs now call a library to erect 2D graphs within the GUI (a new window is just fine).

Practical Application


Personally, I can't see the point in doing *ANY* research that doesn't have a practical application. If it doesn't have a practical *use*, well, it is *useless*. Once you cross the line into the insanity, of researching useless things, there is no end to it. Sure, these days the academic world is chock-ful of research that has no purpose, it is not like the old days. Which is what, in large part, leads to the stupidity of reviewers and their criteria: they prioritise category items on a basis that is, well, insane, non-logical. "Novelty is more relevant than relevance."

Note that I did not say that all research should be commercially relevant, which is a different thing. Let's not get distracted.

Quality

One thing I must say is that your papers are much better than most papers in these subject areas, both the Relational and the BIM/CIM. They have far more definition and far more detail, they are well laid-out, and they are more understandable. The concise number of citations is great (lousy papers have hundreds of citations, 99% of which are irrelevant; they have the structure but lack substance).

Possible Improvement


As for improvements (to the already great papers), personally, I would:
- increase the number of diagrams

  • include standard-compliant data models. Better if you had one single data model as an appendix., and refer to it or parts of it.
  • use terms that are accepted in the industry
  • use a better drawing tool, so that the diagrams are cleaner, more precise. I think you know that the arrow-lines and the fill-pattern is too faint to show up in a PDF. All that has to be fixed

Which leads to ...
- use the same symbols in the diagram as it the text (you have done that, I am asking for more symbols in the diagram, and a more drafting-style diagram)

  • eg. fig 1.2 in /Topological Database for Architectural Spaces/ is too crowded: it has to standardised and expanded
  • eg. fig 1.5: draw the dots and arrowheads much smaller; assuming the perspective is from above, use three separate shades of grey and one of white.

I offer this (page 1) as a suggestion (the legend is obviously wrong, but it is not clear which line or plane in the source diagram is being referred to; if the greater-than sign is a pointer or if it is part of the function): http://www.softwaregems.com.au/Documents/Student%20Resolutions/Norbet%20Paul/TDAS%20Note.pdf

All that is minor, except for the data model.

That is the all I can think of.

Documentation


The documentation required for a package is of course, different to the papers. The papers remain as the foundation, not to be set aside. It must be aimed at the technical user, who does not understand mathematical definitions, such that they can understand the relations; the requirements for storage,; etc; and use the queries.

Who would you like to be using this:
- only people who (a) understand mathematical definitions, *and* (b) topology, expressed in that form. In that case, the doco as is, is fine, noting the improvements. The target market is tiny.

  • (c) technical people who do not have (a), and understand (b) but not in terms of mathematical definitions. Probably quite good at their CIM or BIM, but weak on the maths. In that case another layer of documentation is needed, so that they can use your tables and queries directly.

Relational or Not


  1. I didn't find that your relations *when described as relations* (eg. TDAS p95 top) were non-relational. AFAIC, it is not possible to determine relational-ness from the terse mathematical definitions of relations, and without all other relations being given.

(By virtue of the evidence (what they actually produce), there are many posters here who have mathematical definitions for Record Filing Systems only, in text format only, they do not recognise Relational when it is fed to them through a feeding tube, in the form of standard-compliant models, and they can't Normalise their "relations". But they anoint themselves with the pig fat, that allows only themselves to determine what is "relational", and nitpick about anything that they themselves did not produce. Their idea of "theory" is limited to mathematical definitions. The "not invented here" syndrome is alive and well in the so-called academic world that pertains to "relational".)

2. On the other hand, your relations *as depicted in SQL or UML* (eg. TDAS p95 bottom, p180) are definitely not Relational. And they have to be.

  1. If all (or the majority of) your tables have a surrogate (ID column; Identity; autoincrement), by definition, it is not a Relational database. It implements the Access Path Dependence prohibition in the RM. And it results in many more tables involved in joins, and many more indices, than a Relational Database.
  2. Relational database Integrity is very important, but it is not restricted to, it does not consist of, merely Referential Integrity. The RM defines Relational Keys, and you get much, much, more Integrity from than, over and above DRI. That is also, precisely where the (i) speed and (ii) power of Relational databases lie. Note that (ii) is where the ease (or not) of coding queries lies, both planned and unplanned.
  3. Tables such as Personen (as implied in TDAS p180 Addressen, but not given, or as given in TDAS p 198) are non-relation for a different reason. In addition to [a], they allow duplicate rows, which are prohibited in the RM.
  4. My understanding of the proposition might not be correct, but on the face of it, we do Mitglied and Mitglied[Anything] in a much better, much more simple way, as per Codd's RM.
  5. If you think that "SQL is broken", as propagandised by the self-described "hexperts in the Relational Model", let me assure you that that is false. If you think that "you can't implement a Relational Database in SQL", and thus you have not bothered to, let me assure you that I have close to one hundred, and they are acknowledged as such by technical auditors.

All of this is easier to explain with a diagram, so (page 2): http://www.softwaregems.com.au/Documents/Student%20Resolutions/Norbet%20Paul/TDAS%20Note.pdf

Please forgive my translation to German.

f. Please feel free to give me your SQL tables (which are currently non-relational and have *only a fraction* of Integrity), and I will give you a Relational Database in SQL, with *full Relational* Integrity. And Speed. Probably overnight.

A full data model is required, hopefully in a Standard notation. I might need ch 4 from TDAS translated, but I might not, I am willing to try it without the translation.

Re TDAS 4.5.1, I am not sure my translation of it is correct, therefore I will not comment on specifics, but let me assure you that any Rdb that I provide has:
- "fifth Normal form" which really means no duplication whatsoever; Codd's famous 3NF applied across all tables; and no Update Anomalies.

  • "sixth normal form" for the elements that require it (not per the definition, because the author has no clue about what its two quite different purposes are, despite me providing a fully documented data model and code). Just the Columnar Normalisation, to remove optional columns to a separate table.
  • "domain key normal form" as Codd intended it (not the hysterical collection of fragmented mathematical definitions, which the author himself says is impossible to achieve: he is fixated in non-relational Record Filing Systems). Ie. all data and all Integrity is controlled within the database, declaratively, by constraints of various types. Far, far beyond the fragmented definitions.

Relational (IDEF1X) vs OO (UML)


A lot can be said here, but I will keep it short. Note each point, although terse, is very important.
- UML is marvellous, absolutely smashing, the bees knees. For modelling program components. For a OO mindset.

  • UML cannot be reasonably used for modelling data. Its simply doesn't have the mindset, and therefore doesn't have the symbols.
  • Data vis-a-vis Program, are very different, and the Analysis; Design; and Modelling methods, are quite different. Similar to the difference between the wiring for Control vs the tubes for transporting material, in a factory. Or a body. They are never mixed up. The rules for usage are different.
  • OO types tend to treat data with disdain, as the servant or 'populater' of an Object, and view the database as a set of simple tables, which are in fact Files (note the Access Path Dependence of each), that merely store (make persistent) their Objects.
  • They are quite happy to "re-factor" their Record Filing system every time they change their Objects. They have no clue about Data Integrity or Relational Integrity; about Relational Keys and their speed and power.
  • All the boxes are the same (square corners), the lines are the same (solid), the labels on the lines are meaningless wrt Data (may be meaningful wrt Objects, Actions). It is not a database by any means, and it is not Relational by miles.
  • Data Modelling means modelling the data, as data, and nothing but the data, so help me Codd. No reference the the usage or the app.
  • The result is permanent tables. Extensible, easily, but permanent. Data does not change, apps that use the data change every few years, and OO apps, due to their "re-factoring", change with every version.
  • Therefore Data Modelling uses quite different rules, and a different Methodology. And requires a notation that has the symbols, the richness, that is required. Eg. one diagram, instead of a diagram plus a CREATE TABLE statement.
  • We have had IDEF1X as the Standard, a Methodology, for modelling Relational databases since 1984, and available as a tool ERwin since 1987.
  • Unfortunately the dear people who write books allegedly about the RM and teach courses allegedly about the subject __a. have not heard of it [until I introduced them to it on TTM a few years ago], __b. do not use it __c. do not understand pictures (right brain, 96% brain power), they prefer text (left brain, 4% brain power), and especially like isolated fragments of text __d. the very pictures that the practitioners have been using for thirty years, to model and implement Relational databases __e. but they remain convinced, that they "know" the subject __f. The 'not-invented-here' syndrome is badge of honour.
  • Relational Modelling, and IDEF1X as its method, elevate and implement all the requirements of the Relational Model. That means (to name a few): __a. Relational Keys, or Identifiers. Used correctly, across the db __b. The square vs round corners are relevant: Independent vs Dependent tables __c. The solid vs dashed lines are relevant: Identifying vs Non-identifying Relations __d. Of course we differentiate base Relations vs Derived Relations, therefore much of the activity and argument that the academic types engage in, for four decades, is irrelevant to us. __e. Thus the Relations (as opposed to tables, which are of course base "relations", which we call the TABLES) are meaningful, not mere lines that "connect" or "link" Files. The tables are the SUBJECTS; the relation lines are the ACTIONS that take place between the subjects, expressed as a Verb Phrase. (By "Projekt [is a] client [of] Person", you may be using this concept, but from here, it looks like you mean it in the OO sense alone, meaning dependency, Client object.) __f. And the database, all the Business Rules, are implemented declaratively, in constraints of various types. As much as possible for most qualified practitioners, 100% for me.

You won't get any of that in a Record Filing System, or from a person who uses UML for Data Modelling.

Therefore:
- To the extent that you provide data models (as opposed to relation definitions in text strings), it demonstrates the extent of your Relational mindset

  • To the extent that you provide data models in UML, it demonstrates the extent of your OO mindset, your non-relational mindset, a superficial approach to Data Modelling
  • To the extent that you provide data models in UML, the models are absent a number of Relational requirements, concepts and constructs, therefore it is impossible that it is a Relational Database
  • To the extent that the tables have a RecordId as the PK, it is guaranteed that the thing is a Record Filing System, deployed in a database container, for convenience (use of SQL, backups, etc).

The fact that the data models provided are examples, some of which are not directly related to the present work, is irrelevant, you are demonstrating.

Therefore, I would strongly recommend that:

a. remove all the examples (pure examples, not related to the architectural spaces) from the body, place them in a single Data Model, as an appendix, referred to from all parts of the body
b. remove all the definitions of "relations" related to architectural spaces from the body, place them in a single Data Model, as a second appendix, referred to from all parts of the body
c. Ensure that [a] and [b] are fully Relational.  This assures the reader that the constructs; the tables; the queries, enjoy all the integrity, power and speed of the RM
d. which means, you can't use UML, you need to use IDEF1X
e. Your tables and columns MUST be correctly and meaningfully named.  Single character names, and acronyms are not allowed.  If your RDb is to be Open Architecture, you have make it so.  
____SELECT FROM t WHERE x = y AND EXISTS (...) ... is fine for the paper. But totally unacceptable for any person who will use the RDb (ie. your productised package as detailed in the Next Steps section). Not acceptable for your Lisp project, you have to get a start there.

Note that the sequence [c], [d], is relevant, it isn't the other way around.

Note that I didn't mention Normalisation in the above section. Of course, the data has to be Normalised, and Relationally Normalised, if you want people to see that you are following the RM. Codd defined two specific Normal Forms in the RM. The creatures who write books allegedly about the RM, and the pseudo-mathematicians who write maffemafical diffinishuns of abnormal "normal forms", have failed, miserably, in forty four years, to define those two Normal Forms. So the community is left with the pre-relational, Record Filing Systems that the authors know, the abnormal methods of "normalising", and the fragmented "normal forms" (that suffice for RFS).

What I am saying is, it is not your fault, that your models and tables are not Relational, you sit in the 95% of the RDb community, who are victims of such cursed authors. And at the high end of that 95%, because your work is very good.

Versioning


In TDAS 4.4.8, you state:
> Since the spatial planning also works with versions of rooms, it makes sense to extend this concept of version control on topological spaces.

Doesn't make sense to me.

In the RM, Codd gives a method for versioning, which is admittedly very terse. It works beautifully. (I use it for "temporal" databases as well, but let's not get side-tracked.) The assumption is that all the data elements for your architectural spaces are deployed in SQL tables (separate to the OO Objects). Therefore, the versioning concerned is about the actual architectural spaces changing over time. Eg. I would like to erect a diagram of the current kitchen door plus doorframe, alongside the 2007 version of that door, or determine the differences (ala diff), at the data content level, between them.

I think you appreciate the value of solving a problem via a Relational Database, first and foremost. You quoted Codd to that effect in one of your papers, I forget which one.

If you implement versioning in the SQL tables, you can perform all related functions in SQL queries, rather than
- export;

  • RCS/CVS/diff/patch;
  • then import.

(You still need to save all your code, SQL and other, in RCS/CVS/Subversion. Just as everyone one has to. But that is purely for backups, restores, etc. Not for versioning of architectural objects.)


  • C --

> C: You are, of course, welcome to read my papers: > http://scholar.google.de/citations?user=ZZMZCY4AAAAJ

Yes, I know, I have already read some of them, at academia.edu.

Note, I did not read TDAS completely, which requires a full translation. I skimmed it, and my German is horrible; picked up what I thought was relevant; translated that; and commented.

But [what I think are] the two main papers
- Basic Topological Notions and their Relation to BIM

  • Topological Databases for Architectural Spaces are not downloadable or cost money or unavailable in English.

Could you please email anything that you are happy to provide, that is in English, to me.

Feel free to recommend specific papers that are limited to the scope of you original enquiry. There are many and it is sometimes hard to figure out which ones are relevant to the OP. You certainly have been productive, and the produce is high quality.

More on [C], later.

Cheers
Derek Received on Tue Jan 20 2015 - 10:36:45 CET

Original text of this message