Re: some information about anchor modeling

From: <derek.asirvadem_at_gmail.com>
Date: Mon, 4 Feb 2013 03:43:18 -0800 (PST)
Message-ID: <ce57f5b1-d41d-4ee4-b03b-567de395a966_at_googlegroups.com>


Vladimir

First, let me say that I like your work very much, although I do not agree with some aspects of it.

Second Let me say that there are a few here who are just here for the argument, with no genuine commitment to resolution. They are the ones who bellow I AM RIGHT and YOU ARE WRONG, with nothing to support their bellowing. I suggest you do not answer their posts. I do not even read them.

Third, there is only one RM, and Codd is the only author. There are many authors who write about the RM, who subtract from the RM, and add their own weird stuff to, then they package it all together and market it as the RM. That is fraud. Initially I accepted these authors had some point because, everyone in databases talked about their various theories. Over the years, no decades, and as a result of sometimes intense interaction with them (or non-response from them), I have formed the evidenced conclusion that they are at best, neurotic and obsessed with irrelevant details, and at worst they are subversives who seek to damage the RM and the use of it in the world of db implementation. I suggest that you do not quote from, or try to understand such writings.

There are three subjects that you have raised here that I would like to discuss with you. I don't know if I can cover all of it in a single post.



Surrogates

I agree with most of what you have written about it. They are definitely bad news.

> As far as I know, this is the first time that someone has explained why the surrogate key is a bad solution.

Definitely not. We knew about it from the early days of the Relational paradigm. I have some of Codd's papers with me at my current location, but not the RM/T paper, so I can't argue specifics about that, but I can argue specifics about the others.

  1. Codd decried surrogates. He may well have talked about them in RM/T but I doubt that he supported them, given what he wrote in RM. Perhaps he was just discussing them.
  2. I refuse to call them "surrogate keys". They are not Keys by any definition. The insane who twist Codd's definition of Keys in the RM, prove their insanity, they do not prove that they are "keys". They are attached to their databases that are full of Id[iot] surrogates, and anything that appears to attack that is scary, so they react defensively.
  3. Most (all?) the authors other than Codd, do not understand Relational Keys; therefore they do not understand the VALUE of Relational Keys. Another reason why you will not get any sense from some posters here, you will get arguments about semantics and intrinsics and implications. Because they do not value RKs, they are open to surrogates, and they usually stamp a surrogate on every table. I value RKs, I am not open to surrogates willy nilly.
  4. There are a few papers written by some neurotics, who have since become famous (to me and a few genuine Relational types, they are famous neurotics; to the rest of the world they are just famous). But note that the neurotics cite each others papers, and thus elevate each other, a form of mutual masturbation. These papers support surrogates. They jump through hoops to justify surrogates. They have "normal forms" of surrogates. Of course it is madness, and constantly needs to be maintained, so they now have "normal forms" of "normal forms" of surrogates. (The "normal forms" have serious problems, but let's not get distracted.) All this has the result that most db implementers these days have no knowledge of Relational Keys, or their value, and implement surrogates across the db. Of course, such dbs have no Relational power or speed at all ... but they do not know that, because they have read the famous books and they think they are implementing famous "relational" structures. Db implementers are robbed of the value of the RM, and they are stuck with some monstrosity that they believe is the RM. This is one reason I believe that these authors may be subversives.
  5. A pure Relational database will have no surrogates. It will therefore supply OLTP and OLAP from the one single database (Codd did write an OLAP paper). I supply that as a matter of course, with no fanfare. A few others talk about doing the same, but I have not seen a database of theirs that actually does so.
  6. There is one condition, and one condition only, that justifies a surrogate, and one cannot get around it. That has not been described by anyone in the entire thread above. But that does not subtract from the fact that whenever you use a surrogate, you break the relational capability at that location. Therefore I cannot state "never use surrogates". No, avoid surrogates as much as possible; use it only you have to; and when you do, choose the location carefully.

Plagiarism

Yes, I understand, from painful experience. So let me start out by saying I am generally on your side, I agree and empathise.

But I think you need to understand that although there are laws against it, etc, it is sadly very common in the west. Especially in the last ten years, where universities are no longer centres of learning; they are centres of programming humans to be herd animals, and to compete without resolution. I am not saying "deal with it", I am saying, protect yourself.



Highly Normalised Tables

Let me say that the "normal forms" are forms of insanity. Mathematicians who have no IT qualifications have come up with abstractions about concrete objects. And now, abstractions of abstractions of concrete objects. The most important issue is, thousands of people try to Normalise their data by using these "normal forms", and fail miserably. If you write to the cretins who wrote them and ask for a method, they tell you that there is no method; that their "normal forms" do not have a method of achieving Normalisation, it is a measurement after the fact. Of course, the seasoned practitioner knows that, but the masses don't. So the sad fact is, the name "normal forms" is a lie, they have nothing to do with Normalisation.

They have nothing to do with Relational Keys, either.

And a distinctly different point, they have nothing to do with Codd's Normalisation. Since it is in the RM, we can call it Relational Normalisation. So they have nothing to do with the RM or Relational Normalisation.

I have stopped using the term "normal forms", because I do not want to participate in their fraud.

These neurotics have a veritable orgy of defining "normal forms", citing each other, elevating otherwise hopeless papers. If you write to them about the RM or RKs, or Normalisation, they blink and say they know nothing about it, and request that you communicate only in mathematical definitions. Most implementers can deal with logic and IT definitions, but not mathematical definitions, so I write for that audience. This is a trick the mathematicians who have no IT qualifications use, to avoid robust discussion, to avoid exposure, to maintain the relevance of abstraction. When you realise that the objects they are "abstracting" are not abstract, the bubble is punctured, their value is lost, so they defend their abstraction to the death.

Re your issue with Anchor Modelling. I think the best way to explain what I have to say to you is to provide a little chronology.

  1. I was a software engineer for one of the pre-relational db vendors. In those days, computers were expensive; IT people were properly educated; we had standard;s and we stuck to them. I was privileged to work with great customers such as 3M and Kodak, I was at the cutting edge of db technology (not the abstractions of it). When the RM came out, we all knew what it meant (different from what the masses understand it to be now, for the reasons above); we all worked towards it. As I embraced the RM and moved into working with it with my high-end customers, I was shielded from a lot of the nonsense that is marketed as the "RM". I went into consulting and still enjoyed my high-end customers who understood the RM, and I was delivering high performance RM only. It is only in the last, say six years, where I have started answering questions on fora that I realise the sorry state of the majority of databases, and the sadly misrepresented RM.
  2. When 3NF was the highest NF, I was delivering 3NF by definition. When 5NF became the NF to be accepted as minimum in the financial markets, I was asked to go back and "upgrade" one of my previous 3NF databases to 5NF. After studying it, I simply wrote a declaration, at no charge, that the db was 5NF. How ? Because, before the neurotics wrote the definition, I was Normalising as a principle, producing dbs with zero update anomalies (which means zero duplication of any kind). MVDs were pedestrian to me, because I already had RKs.

If you are not neurotic, the FDs taken wholly and completely; the famous "every attribute must depend on the key, the whole key, and nothing but the key" taken to heart is MVDs. I do not need a neurotic definition to figure it out.

3. Many of us have fairly intense requirements in our databases. I had situations where I needed data that was stored in rows, to be displayed as columns, etc. Without duplication, of course. I did not have books on the subject 20 years ago, I just Normalised the hell out of it , and came up with a table structure, that served row or column requirements at the same speed. SQL did not provide the constraints I needed, so I wrote a little catalogue, or as some like to call it "meta-data". Over time, I perfected it, and used it in many situations. I did not give it any special name, except "Highly Normalised Table".

4. I do not suffer the "null problem", it is a total non-issue to me, and the great number of papers that have been written about it are, to me, the sad meanderings of neurotics, who get lost trying to find the toilet. I have never, ever, stored a null in a database. I won't get distracted with a discourse re The Null Problem here, but it does deserve one at some point.

- Missing info is a bad name, because, given that the entity is defined, you either enter the whole row or not at all.
- Optional column is a better name, because it identifies the issue being dealt with exactly.
- Optional columns simply need an Optional table.  That is a natural result if one Normalises to the point where there are no unpopulated columns or unknowns or "unknowns" in the database.
- I also do not have a problem with the methods that Codd suggested (in the RM/T, I believe), which allow either a bit, or using a value that is out-of-range, to indicate that a column is not being filled.  That was demanded in the old days, due to a SELECT being limited to 16 tables; that is no longer demanded, as the limit is now 50 tables. 
(I am making this point because you seem to be saying that Codds' RM/T does not handle Nulls correctly: it does, if you get the arguments that he was having with the neurotics out of the way, and just use the ideas.) Sure, the extreme end of it, for guys like me, is that there is no "Null Problem", but for most people there is one, and 35 years after the issue was closed, they are still arguing about it.

5. Something like 6 or 7 years ago, it was brought to my attention that someone I had never heard of had written a paper; identifying 6NF; as the ultimate solution to the "null problem" (which I did not suffer, but they wanted me to look at the theoretical alternatives). After I got past the silliness, and got to the definition, lo and behold, it was none other than my and the Optional Table that I had been using for decades. Now it had an official name. Scores of my tables and my insistence on a catalogue was validated. I did not realise then that he was an abstract neurotic, I was told by many about his works, etc, so I treated him with respect, joined his website as requested; interacted; etc.

6. I specifically wrote to him about the VALUE of his 6NF; about the way I had used it for both Optional Columns (which he had identified) and Highly Normalised Tables (which he had not identified); about the catalogue; about pushing the SQL Committee to incorporate support for it, and thus eliminate the need for the catalogue. Nothing. Instead, more invitations to interact about his baby. One year later, I wrote a reminder. Nothing. Instead, more invitations. I formed the conclusion that he was a neurotic, an abstractionist, and he had no clue about the relevance of, or the application of, his mathematical definition.

Separately, after three years of interaction on his website, I formed the conclusion that his baby had no value at all, except to attack and demean the RM. There is no replacement for the RM. There is no replacement for SQL.

Why is this important ? because it provides further evidence that the neurotic abstractionists have no clue about what they are writing about, about what they become famous for. They do not have any genuine understanding of the RM; they are obsessed with something that is not the RM, and they find problems there. There is no problem with the actual RM. Vendors have completed any bits that could be considered incomplete, 20 years ago, and these poor people are still discussing its incompleteness.

The NFs are useless, they cannot be used even by the people who wrote them.

7. For people like me, who understand Normalisation as a principle, we just Normalise; we know the RM, and we apply Relational Normalisation. We can pretty much guarantee that whatever the neurotics define as a "normal form" anytime in the future, our databases of the past will qualify for it. As I did with 5NF, and again with 6NF.

8. I already comply with DKNF *as the goal that Codd defined* but could not articulate in those days. The definition of DKNF, came later and it is hilarious (do not use Wiki for anything serious). When I wrote to the author, innocently, I found out that it was just another abstract mathematical definition of the concrete world, and to my horror, that he knew absolutely nothing about the RM (but like the other neurotics, he insisted that he did). He could not even confirm if the DM I submitted was in DKNF by his definition. The definition has nothing to do with Codd' goal; it has everything to do with orgies and justifying surrogates.

Therefore:
Forget about the "normal forms", they are a disease that prevents you from achieving anything of value.

9. You and 6NF and Anchor Modelling. As per above, for years, I was able to say, there is only one other company that I know of that (by their technical literature) produces structures in the databases that support OLAP and effortless "pivoting" as I do (note I do not pivot, but that is what most people know it as).

Ok, now there are three.

And that is now called 6NF. Well it isn't. The author is clueless to the value. 6NF is a simple definition. The tables aren't a simple definition. They are the result of disciplined Normalisation, which the author has no knowledge about, and does not recognise when it is presented to him. He called his definition 6NF. So we can't, we have to use another name for the object that delivers features that he knows nothing about. If we call our tables 6NF, we elevate him and his definition, and subtract from our techniques, which came years before the definition. So I have reverted to calling those tables Optional Tables and Highly Normalised Tables, depending on their use, because the terms identify what they are, exactly, and what their purpose is.

  1. And the last point is this. (I have not read your paper.) I have no problem that you wrote the papers first, and Anchor Modelling implemented a database and wrote thier docs five years later. I have no problem that they plagiarised your paper. But there is no way that you can assert that the design or 6NF (not the paper) is yours. I had early forms of it worked out 20 years ago, and final forms say 14 years ago, without naming it 6NF. I don't know how you arrived at it; I arrived at it because (a) I Normalise as a principle, not as a bunch of definitions from the insane, and (b) I was seeking speed for rows-as-columns requirements. Normalisation and performance go hand-in-hand; a progression of one progresses the other.

I am sure that I am not the only one who did that. So it is quite possible that Anchor Modelling came up with the designs, the implementation, etc, all on their own. Although they seem to have plagiarised your paper. Sybase have had a special db offering that provides "columnar access" at lightning speeds, for over ten years. I don't like Anne's response to you, they could have been more direct and given specifics and reasons.

Cheers
Derek Received on Mon Feb 04 2013 - 12:43:18 CET

Original text of this message