Paper: Database Programming Languages
Date: Tue, 3 Feb 2015 21:44:34 -0800 (PST)
Message-ID: <4ce5e81a-01db-48c9-b13a-9ad156d3d444_at_googlegroups.com>
Jan
> On Monday, 2 February 2015 21:47:19 UTC+11, Jan Hidders wrote:
> Op zondag 1 februari 2015 06:33:40 UTC+1 schreef Derek Asirvadem:
>
> > > I would btw. be very curious to know what conclusions your client would think he or she could draw from them.
> >
> > Since I have already given you a synopsis, a short chronology, I am not sure what you mean. Would you like a more complete one ?
>
> More detail, as in where the devil lives. Because I suspect more is concluded from the paper then is actually warranted and meant by the author. But if we get to that later, that's fine.
You will understand perhaps, that I am restricted due to contract and confidentiality issues.
Sydney. One of the larger Australian banks. From experience, we are much more conservative, and we have much more legislature governing, than American banks. Probably (not sure, from speaking to colleagues) somewhat more than than EU banks. Customer has an app, OO, ORM, all the OO bells and whistles. The data store in in the app, ie. closed architecture, but deployed on MS SQL for convenience (SQL DML, backups). Typical OO monolith, typical OO minus RDB problems. Hundreds of objects have progressed to thousands, a maintence and performance nightmare. Three years of failure, of various kinds, but the most important is lack of data integrity, and every time they install a new version, a whole set of new bugs are exposed. Team leader has a lot of influence, but the auditors gave the business an ultimatum: fix it or shut it down. Second most important: the problems getting the data in/out of the data store, and behind that (really the same problem) lack of Relational Database access. Performance is crap but they are used to it. So they came to me (I had replaced an app+Db in another division in NZ, and the auditors had nice words), and the directive is, replace it with an RDb+App, and teach the team standards, so that they never make these mistakes again. The business has no choice, he doesn't need to get a budget allocated, as it is a risk issue and bank level funds are already allocated. I work for the business, so there is some politics, but I answer to, get everything signed off by, the auditors.
Paper. The Team leader is on his last legs. There is no pressure that he will get fired, but he is dying from loss of credibility. He is adamant that if he adds more complexity to the app, to the objects and classifiers, the app will "work", the bugs will be fixed. They have heard this for three years. This time the difference is, he is making a formal presentation, along with two papers that support his position. I am pretty sure that he got that from his OO groupies, no idea who that is, probably some consulting firm that his wife or brother knows. I say that because the presentation was reasonably professional, while denying all the facts.
This is the typical state of the majority of the OO world: despite the past 100% failures, they will do it better next time, promise. Note that if nothing changes, nothing changes.
DBPL is yours. The central theme of your paper is <read abstract>. He is saying that there is scientific, theoretical, evidence that more complexity deployed in the OO classifier layer will fix the data integrity problems, thus it is his team's fault for not doing that properly, not a problem with the OO/ORM concept (specifically no independent RDb), and he deserves yet another chance.
After a quick consultation as to the veracity of each of his statements, proved to be zero, the auditors have asked me to take the presentation down formally. The business has asked me to do so softly.
Enough ?
Proper Use of Paper
Here is what he presents (what the whole OO/ORM world uses):
- the premise of the paper (central theme ?) is that there are data integrity issues that result from (eg) multiple inheritance
- that such issues can be corrected by increasing the level of complexity in the objects, the classifiers
- a proof and specific methods are given
From reading your paper, I cannot say that he (or the OO/ORM crowd) is using it improperly
The paper is a good example because hundreds of this /sort/ of data integrity problem (ie. sort, not instances), occur only because of invalid beliefs such as this, and those same hundreds, can be eliminated by utilising established architectural principles in our science, by a RM-compliant Relational Database. The entire OO/ORM madness can be eliminated by a professional implementation, but I won't address that, I will address just your paper and the issues therein, just the project and that one issue.
< bulk snipped, skeleton retained ...>
> > I suspect you and I are not on the same page on this one. So let me clarify, and ask for a clarification.
> >
> > Now in this thread you have stated:
> >
> > > > But most /now/ understand the relevance of data independence.
> >
> > (My emphasis.)
> >
> > To which I replied:
> >
> > > I suppose I have to trust that you mean that in the fullness of the data integrity as prescribed in the RM.
> >
> > Which you have not confirmed or denied. Which means, I still do not know the /extent/ to which you understand "data independence", and how it is administered.
>
> Administered? That seems a strange word to use here.
Administered, analysed, designed, modelled, implemented, maintained, such that, through all those activities, one observes the rules that pertain to data independence, open architecture.
That's fine. Let me respond to each one quickly, I don't want to get distracted from the main issue, your paper.
> I'm also not sure what you meant by "in the fullness of" here
I think we already know, that you guys treat the RM as a pick-list, and with very little understanding of what it contains, what each of the items in the pick-list actually means, what they deliver, etc. And some of us over here in implementation land, treat the RM as The Law for Relational Databases. Not only have we take every word as Law (no picking and choosing), not only have we implemented it, after having enjoyed the fruits of such lawful activity, we have implemented further specifications; finer categorisations; more application areas. Thus the fullness of the RM, and of any single item (data integrity is just one item), it applied fully, and completely, and after much experience, even more fully.
Thus there is a huge gap between one who is a picker, who understands very little of what he is picking, and none of the fruits of what he has not picked, and one who is lawful, full of fruits, and growing more fruits than the original RM described. To wit, we perceive far more form Codd's laws re data integrity, (than you did in 1995, andwhat you do now) and therefore the commercial vendors have enabled it, and therefore we have implemented it. And after a decade or so of sitting on top of that MINIMUM level of data integrity we see more, and implement more. Whereas you are still perceiving far less than that minimum, and you do not appreciate the value of it as prescribed. I can't expect you to even imagine, in your wildest dreams, what we do beyond that minimum.
So for this paper, this issue, we are only dealing with prescriptions in (a) our science and (b) Codd's RM.
There is a good example if your are interested, that I will be working through, in the On Normalisation thread.
> Am I aware how current DBMSs realise (to some extent) data independence? Yes, I am.
Per details above, and per evidence in your paper, I really do not think so.
> Am I aware of the available techniques that are not yet implemented? Yes. I Am.
I think you are aware of a fraction (of what is implemented in the commercial platforms).
Separately, the evidence is, you are unaware of the concepts re data integrity/independence, that must be implemented IN the data.
From where I sit, there is never a trade-off, the evaluation you give never happens. It is no problem to implement complete and total data integrity (that item will never be traded off), to any level of complexity, in the database. Imagine what my databases do: we maintain millions of public trades, to hundreds of complex legislative requirements. Both the declaration of those requirements, and the maintenance of the data to those requirements, is in the database. And any implementation of such, outside the database, is not only wrong, incomplete, etc, it breaks the architectural principle of separation of /Data/ vs /Program/, the result will be a sub-standard mish-mash of complex objects that fail anyway. Typical OO/ORM madness. As supported by your paper (among others, and by book such as AHV). The complement on the OO side is actually desirable: simple, rather than complex objects, that are less vulnerable to changes.
Further there is no merit in the monolith, we killed that in 1985. Only really uneducated people still (a) prescribe them and (b) build them.
From where the auditors sit, when they recognise that some project has broken those laws, those principles, they send the team off for re-education.
> > So the clarification begs. The paper is Database Programming Languages, 1995. Are you aware:
>
> Yikes! My very first paper that I wrote as a beginning PhD student! :-) Ok. This is going to be interesting.
>
> > 1. That, on the face of it, your statement above, contradicts, or let's say unofficially retracts, the main thrust, the solution given, in your 1995 paper ?
> > __ (which is why I stated "... the papers have not been retracted, all we have is a statement from the author in an unrelated post on c_d_t stating that "most /now/ understand the relevance of data independence.")
> >
> > Or, do you stand, on that paper, now ?
>
> I'm not sure which statement you mean, but I don't think I've said anything that strongly contradicts the results and assumptions in this paper. I'm also not sure what you mean by "presenting a solution" here. The paper does not introduce a new model, it studies an existing one and focuses on reasoning over union types within that model. But the results actually carry over into other data models.
Central theme described above.
Ok, so you have not retracted it, the paper stands.
Yes, I agree, the model is not yours. You support the model, and you provide methods /within/ that model, to fix problems /caused/ by that very model. You give a method to fix the problem, in the model.
You fail totally, to realise that the problem is not in the model, and therefore no amount of fixing it in the model, will fix it.
I accept, you understand more about data independence than you did in the past, but it it still a tiny fraction of that contained in the RM. And that that limited perception, that inability to see the relevance of the items in the RM, hinders you from (a) dealing with data issues in the data (in the RDb, in the platform), and (b) maintains your venture of implementing data integrity in the object layers (the Program), which is at best, fragmented and only a tiny portion of [a].
As long as those items are implemented in the /Program Space/ and not the/Data Space?, they break a number of laws established in science and in Relational Database. We have the laws, specifically to protect society from precisely the results of what you are describing.
And the fact that you are continuing in this path (the "model"), in denial of the evidence that this path has failed, increasing the complexity of the vehicle, means you are not observing basic scientific principles, you are simply addicted to the path.
> > 2. Of the Architectural Principle, established as science in our field, that Data must be separated from Process ?
> > __ (And it follows that there are separate and different methods for Analysing & Designing the two, etc, etc.)
> > __ It is clearly established in the industry, that implementers are specialists in either the /Data Space/ xor the /Program Space/ (those who cover both are few, and exceptional).
>
> I am aware of that, but not sure why you think this is relevant for the paper.
>
> Btw. when you say "science" I have the impression you actually mean "engineering".
I was brought up on science. I went to a scientific school. My tertiary education is science, computer science. I have a lot of interest and understanding of engineering, and most of what I do is engineering, yes, but that is the application of science. It rests on, and relies on, science. Now, in the thirty nine years since I left college, you guys might have changed the definition of "science" to some floating flying itinerant ever-changing object, but I am not about to do so.
> > 3. That [2] existed, as science, before Codd, 1970, the RM ?
> > __ (That it has been furthered ever since then, and rendered for whatever context one uses (eg. a RDB; an awk script). That it (as with everything in science) has only gotten stronger as an Architectural Principle, and applicable in more contexts.
>
> That engineering principle has a long and venerable tradition, yes.
Well, your answer to [3] contradicts your answer to [2]. [2] and [3] are inseparable. Codd did not invent [3] out of thin air, it was based, founded in, existing science, including the Hierarchical Model. You cannot carve of a specific implementation of [2], namely [3], and deny [2]. It is absurd.
Whatever you perceive as [3], in isolation from [2], is a deformed, not whole [3]. In which case, you do not understand that venerable engineering principle.
> > 4. That in his paper, the RM, in 1970, Codd gave specific /further/ prescriptions and prohibitions re "data independence", without having to explain what "data independence" meant, because it was well-known ?
> > __ Which resulted in implementation of those concepts in the commercial RDBMS platforms, as well as in the implementations of RBDs.
>
> To some extent. From my colleagues who were around at the time I know that the concept already existed, but not everybody understood it in the same way.
Well, then, the fact that three was a difference among them means that they, as a group were partially ignorant, and that the ones who had the higher understanding could not, did not, bring those at the lower level, to the higher level. A direct result of picking and choosing from a list that you (they) did not understand.
And it must be said, they (you) live in ignorance of what the vendors did, and why they are doing it, why certain capabilities have been implemented, and others not.
Whereas, for those who observed the law, as law, the higher level was the only level.
> > 5. The result being, that 100% of all controls upon data should be deployed in the RDB ?
> > __ (As I am sure you know, DKNF alludes to this. We implement a much fuller form, as standard practice.)
>
> Not sure why you drag poor little DKNF into this, since that only deals with a very small part of this, but, yes.
Ok. Good.
But then we have a problem. Self-contradiction again. You accept cannot that 100% of the controls on data, data integrity, closely related to data independence, should be deployed in the RDB, and at the same time be supporting a model that deploys some large portion of said controls in the app layers, the objects and classifiers.
Sure, your answers allow you to argue on both sides of the fence, but it is incoherent. You damage your credibility.
> > 6. The corollary being, that controls on the data should not be deployed in the /Program Space/. Eg. OO Objects or classifiers ?
> > __ And if it is deployed there, (a) it will never be adequate, or (b) as complete, as a deployement in the /Data Space/. Something that has been painfully proved in millions of OO-centric implementations.
>
> Definitely, yes.
Good. Ok. But same response re self-contradiction above.
> > > To be honest, although I have opinions on these issues, I find such discussions unscientific and without any merit, even if it is about how Codd himself meant his model to be understood. It is akin to the argument by authority, which is a very weak type of argument.
> >
> > Per details above, I do not expect that type of argument.
> >
> > We do need to take Codd as the authority. Otherwise we can pack our bags and go home.
>
> Quite the contrary. Codd's contributions were fantastic, some of them anyway, but it is by no means the last word on these matters, if only because technology and insight has progressed since then.
Well I disagree strongly, but I won't take time to enumerate. Quickly, re these matters :
- Neither I nor any of my colleagues (the high end of the implementation space) know of any of Codd's contributions (minus the known retractions) to be anything less than fantastic; less than law; less than the last word - there have been no insights of value published since Codd - there have been no progress in technology that relates to these matters since 1984___ expect for improvements and enhancements in the platforms
In case you are taking about the theoretical fraction of our field, I am quite aware that there has been a lot of MMM activity, but there has been no published result of any value from the theoreticians, since Codd.
Please feel free to name one, or to provide a link.
> > > What matters is, which objective arguments were put forward to support that interpretation and what the evidence for its merit was. Which interpretation leads to the most effective DBMSs [, RDBs] and what scientific evidence is there for this.
> >
> > Note my insertion.
>
> Yes, noted. But I disagree with the R there.
What, in this day and age, you give assent to a database that is not Relational ?
What one of the hundred or so OODBMS that have come and gone in the last twenty years ? That makes the same mistake detailed above, failure to separate /Data/ and /Process/; fail Data Independence and Open Architecture; failure to decompose and deploy. Oh, the next one will work, will it.
> > Yes, all very good points. But I think even that /could/ be avoided, or let's say, easily stated and closed: the /commercial/ vendors have already done that work; the high-end implementers implement it. Something that the theoreticians do not seem to be able to comprehend. they are about thirty years behind the industry that they theorise for. You will of course, have to accept evidenced reality as scientific evidence, not papers by theoreticians who have already established themselves as un-scientific. Mathematical proofs alone are pure garbage.
>
> Mathematical proofs can only proof mathematical facts, not whether a certain model is practical or not, although certain results can give some support. There the proof is really in the eating of the pudding. Any other position would be unscientific.
Well and good, generally speaking.
But where it concerns the matters on the table, the principles discussed, and specifically your paper (a) there is a well-established and easily recognised model, (c) you are supporting it, your paper supports it, and (c) the people who propagate that model use your paper (as well as fifty more that I am aware of) to prop up that model.
The model is broken. Despite twenty years of fixing and re-inventing, it remains broken. These guys are in denial of reality, evidenced facts. An dyou both support the model and add to it, while claiming that you agree with the principles that prevent such models.
> > Actually, if you excised the mathematical proofs from your papers, it would increase their credibility. Because the mathematical proofs have been proven false in the course of time, or were false from the beginning due to their contradicting other sciences [specific principles , now identified].
>
> Mathematical proofs can only be proven false by mathematics. But perhaps you mean that the underlying assumptions about how the models and assumptions are relevant in the real world might be shown to be incorrect by other sciences. Yes, that can happen.
Ok. A bit naïve, because thousands of people use those papers as proofs to propagate their model; millions of implementers use, and they rely on those proofs.
Not guys like me, because we have our feet on the ground, and we do not deny other sciences.
But for those who do have esteem for the model, ignorant of science, they end up in the situation where, when the model is broken, they believe the fault is with themselves, not in the model, and they try harder next time. So the real crime is not with them, it is with the people who invented the model; who propagate it; who implement contrary to the science; who deny other sciences; who write papers supporting it, fixing it; etc. That is why I say, they commit a massive fraud.
The concept of the OO data model, or OO/ORM+data store, ie. minus an RDB, as detailed above, is a total scientific fraud, non-science.
Especially in light of the fact that the traditional model, 100% RDB for the /Data/ and 100% OO for the /Program/, ie. minimal ORM, and no writes allowed, works perfectly and does not break any scientific principles. I will refrain from listing the benefits and the money saved.
Ok, to summarise your paper. In the context of this thread.
- it acknowledges and supports the OO/ORM+data store model
- it acknowledges that there are myriad errors in the model ___ that data integrity (the tiny bit that you do understand) is broken
___ (interestingly, you do not attack the data integrity problem, you address only the display of data that is erroneous) ___ but it fails to determine the main errors or causative error in the model- fails to mention RDBs; the principle of separating Data vs Program, and then dealing with each separately
___ thus it fails to recognise the problem for what it is: data integrity failures, due to ___ a. incomplete definition (analysis; classification; Normalisation; etc) of the data itself, and ___ b. absent constraints upon the data - it continues to perceive data through the very skewed lens of the model
- it determines the problem to be caused by methodology within the model, ___ which is already complex, and has limits to its complexity
- it proposes a solution within the model
___ that is even more complex
___ with methods to deal with that added complexity
To summarise my official Response to the paper, paraphrased for the context of this thread:
- it acknowledges and supports the OO/ORM+data store model ___ that is well-established as broken, as per mountains of evidence in the field, including three years of consistent evidence on this project ___ for the main reason that it breaks the scientific and Architectural Principle of Separation of Data and Process, and the standards we have for data, and separately for the various processes that operate on the data ___ in this, the author contradicts established science, the paper should be viewed in that light
- therefore any and all proposals, as well as proofs, that support such a model, are null and void ___ the details need not be examined
- the paper is dismissed.
To the extent that the TeamLeader, prior to the scheduled education, needs to be informed: - the problem described is real, we have over 200 different data integrity problems that are caused by use of that model, although only one is detailed in the paper ___ this response applies to all such data integrity problems both in the database, and in the display components - since the author has failed to determine the location of the problem, that it is in the data, and incorrectly determines that it is in the object classifiers, no proposal from that position can address the problem - the author makes three cardinal errors
__ 1 Failure to separate Data and Process __ 1.1 consequently failure to deal with each properly, in its rightful location; absence of data controls in the database __ 1.2 attempt to control data in the process space, and after the fact of storage __ 1.3 typical of the OO/ORM model __ 1.4 typical of Maslow's Hammer theory __ 2 Failure to recognise data hierarchies and to implement them as such in the database ______ It is noted, that while the authors diagram, and the text, and the notion of inheritance all refer to hierarchies, the hierarchical component as relates to data is somehow invisible __ 3 General ignorance re the methods used in the implementation space __ 3.1 Relational Model in general __ 3.2 The ordinary capabilities in RDBMS platforms in particular- The data integrity problem itself (and all such problems) are caused by: ___ a. absent classification and treatment of data ___ b. absent /ordinary/ data integrity constraints upon such classified data (no additional or special constraints are required)
- the solution is detailed in that section (refer Response) - once [a] is implemented, the problem (both as stated, and all problems related to data integrity) disappear - Note that there is no work to be done no the object side re the problem ___ In fact less work is required ___ As per standard, all Updates to the database shall be via Transactions only, ie. no direct Updates to tables are permitted (included in the education)___ The entire ORM problem is removed due to removal of Update issues; the remainder is trivial ___ The objects remain simple, there is no restriction whatsoever to the object side, multiple inheritance can be implemented without regard to the data content, and the integrity of such
Please feel free to ask questions about anything you do not understand, I expect there will be a few.
I can't give you the whole Response, but I can obtain permission and provide a couple of the key pages from it, particularly those with the diagrams that explain the Solution. But first, I hope you don't mind me asking, please confirm that you can /read/ standard IDEF1X data models that we have been using since 1985, and UML classifiers. I am shocked to find out, as evidenced in the Normalisation thread, that many (all ?) theoreticians cannot do that, eg. they cannot read the predicates or the constraints that are in diagrammatic notation, and ask for them to be spelled out in text form.
Cheers
Derek
Received on Wed Feb 04 2015 - 06:44:34 CET