Re: Progress Please
Date: Mon, 16 Feb 2015 21:34:06 -0800 (PST)
Message-ID: <b7b730f9-f776-46cc-8dce-a407be6213eb_at_googlegroups.com>
James
> On Monday, 16 February 2015 12:07:48 UTC+11, James K. Lowden wrote:
> > On Wed, 11 Feb 2015 23:00:52 -0800 (PST) Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:
> > > Are we to say they have "hierarchical keys" simply
> > > because employee->jobhistory->salaryhistory are related through
> > > their foreign keys?
> >
> > The reason they are hierarchical keys is because
> > employee->jobhistory->salaryhistory are related through their
> > IDENTIFIERS, their PRIMARY keys (which, btw, components thereof
> > happen to be foreign keys).
>
> To me, that's a distinction without a difference. I just don't see
> what you find significant about it.
>
> employee is identified by {man#}
> jobhistory is identified by {man#, jobdate}
>
> For some reason I can't fathom, you believe:
>
> 1. It's vastly more important that the primary key for jobhistory
> incorporates the primary key for employee than that
>
> FOREIGN KEY (man#) REFERENCES employee(man#)
>
> even though the two statements are equivalent.
>
> 2. The very fact that primary key for jobhistory incorporates the
> primary key for employee deserves the special designation of a
> "hierarchical" relationship, perhaps to reflect how the relationship
> would have been designed in pre-relational DBMSs.
>
> To support this assertion, you list the keys vertically, and note that
> each longer one incorporates the shorter one above, ergo hierarchy.
> Also you note Codd visually arranged the boxes in a way that suggests a
> hierarchy. You'll forgive me if I find that unpersuasive?!
>
> No, I don't think the fact that one table's primary key is a subset of
> another's is interesting, let alone signficant. It doesn't deserve any
> special designation, "hierarchical" other.
First a summary response to all that, then a point address to a couple of items. We are closing the gap between what I am trying to convey (Hierarchies in the RM, in various forms. using Codd's words, and only Codd's words) and what you are doing (which is evidently somewhat less than than, a that hinges on a denial of hierarchies in the RM).
We are progressing. But that denial is no longer an objective logical denial, it a subjective, vociferous, psychological one. With your last post, I have now identified three denials on your side, of important technical issues, which I will attempt to address.
I did state:
> > Now we know from your comments (at the top), that
> > - you don't have Codd's tables as defined in the RM Fig 3(b), as per my IDEF1X model of the same figure. You are just not getting the Hierarchy is within the Primary Keys.
> > - So you have something else in mind (your post that I am responding to, your understanding of Codd's words)
> > - I presume it is not a Record Filing System of the first order (RFS 1), where record are referenced by record ID, and there is no itegrity.
> > - I presume that you have been through the traps, that you have some integrity in them, that the records reference some unique key in the parent record (not the record ID which is NOT a key, and has no integrity).
> > - In order to proceed and close this, have a look at this page, please confirm that what you have in mind is RFS 5, not RFS 1
> > - or something in-between
http://www.softwaregems.com.au/Documents/Article/Normalisation/RM%20Foo%20RFS.pdf
> > Based on your response, then we can move ahead.
But you have not responded to those specifics.
I was expecting some interaction based on the diagram. Nothing.
I take it then, that you have something between RFS 1 and RFS 5. Please note, you are not following Codd's words, you do not have his tables (which are Relational, ala [1.4 Normal Form] ). You have something substantially less than Codd's tables, and you see that whatever difference there is, is "a distinction without a difference".
Denial One - Codd's Words
If you genuinely wish to (a) understand Codd and the RM, and (b) understand my proposal (very much second), I have to ask that you absolutely follow his words, and then mine. Otherwise this discussion has to end, on the basis that, as evidenced, you do not, and you will not follow the RM, Codd's words.
I did not say, that the mere arrangement of components of the key makes it hierarchical. I said, if you read Codd's words, and look at his diagrams (modern equivalents supplied by me, to be read side-by-side with the RM) (again for your convenience:) http://www.softwaregems.com.au/Documents/Article/Normalisation/RM%20Foo%203_B.pdf you will see that the Primary Keys Codd gave, are hierarchical. I merely emphasised that, because you seem to have missed it (and still do). that emphasis is not a proof in and of itself. And that Codd's Primary Keys replace (RNF) the IDENTIFIERS in HM example that he used, which is hierarchical using pointers.
In the diagram I have given for the Hierachical Model, the Employee File in Codd's example, the one and only KEY is ManNo, used to access the Employee File, there are no other Keys. All the navigation within the Employee File is via pointers (open arrows). Repeating Groups ("non-simple domains") are represented by an open arrow with a double head. In the HM, JobDate, SalaryDate, and ChildName are not KEYS, they are identifiers within each Repeating Group (in retrospect, using todays context, sure, they may look like Keys, but they are not).
Denial Two - Difference is Significant
The second issue that you seem to be in vociferous denial about is this. First, let me make sure that you understand, I did not invent the RM or IDEF1X. I am speaking as a faithful practitioner of both. IDEF1X is a standard (unlike UML, it is a real standard). Robert Brown invented it, and did so using Chen's ERD as the starting point. He famously had Codd's input into the process. Which is why IDEF1X can claim that it has all the features that allows a person to *implement* the RM.
To that extent, it has specific features, that are normal, ordinary, pedestrian, to an RM implementer, that incorporate the RM (natural progressions, strictly within the RM ?), that are not described in detail in the RM. Two of those specific features are:
- The difference between Identifying Relationships and Non-identifying ones. Solid vs dashed lines. ___ If the parent Primary Key is used to IDENTIFY the child, to form the child Primary Key, it is Identifying. ___ The entire example given by Codd uses Identifying relations. ___ Typically, the relationaships for Reference tables (eg. SecurityType) are Non-identifying, there is no value in the parent identifying the child, eg. in SecurityType identifying the Security. ___ Wherease, there is a great value in the Employee Identifying the JobHistory and Children.
- The difference between Independent tables vs Dependent tables. Square vs round corners. ___ Each Independent table represents the top of a Data Hierarchy. The equivalent in the Hierarchical model is a File. ___ Each such Independent table, or File, constitutes an Access Path Dependence, which is prohibited in the RM, ___ If the RM is followed, most of the tables in a database would be Dependent.
To which, I take it, your response is:
> No, I don't think the fact that one table's primary key is a subset of
> another's is interesting, let alone signficant.
and:
> To me, that's a distinction without a difference. I just don't see
> what you find significant about it.
So you are denying something that millions of people who understand and practise the RM consider vitally important, that the standard for Relational modelling defines.
Which is understandable to some extent, given that your databases are in fact Record Filing Systems (possibly mature ones, as opposed to totally broken first-order ones). And notably, it is the method that your teachers teach.
Denial Three - Primary Key
This is a result of your teachers' garbage being embedded in your head. It is a trick they use to subvert the RM. The logic they use is the same as you use for Denial 3: a refusal to accept that the difference is significant, "oh sure, I can see that there is some difference, but the difference is insignificant. The two options amount to the same thing".
The issue concerns their use of "candidate key". The RM defines Primary Key. The RM does NOT define "candidate key".
Of course, in the 1980's R Brown, with Codd, following the RM: ___"A primary key is nonredundant if it is either a simple domain (not a combination) or a combination such that none of the participating simple domains is superfluous in uniquely identifying each element. A relation may possess more than one nonredundant primary key. This would be the case in the example if different parts were always given distinct names. Whenever a relation has two or more nonredundant primary keys, one of them is arbitrarily selected and called the primary key of that relation."
determined that all Keys on a relation other than the Primary Key, the Non-primary Keys, shall be named Alternate Keys.
- To the extent that you refuse to use Primary Key, you are in direct violation of the RM
- To the extent that you use "candidate keys", you are using an invention outside the RM.
- To the extent that you refuse to use Alternate Key, you are in direct violation of IDEF1X, the standard for modelling Relational Databases, and of the RM, for Non-primary keys.
Your teachers have a long and consistent history of being ignorant of the RM, and of subverting it. Here, fortunately, I can deal with you, and resolve this, one issue at a time.
So the bottom line on this point is, you are using RFSs of some state of maturity, and your refusal to use Primary Keys for the designated Keys, directly violates the RM.
> For some reason I can't fathom, you believe:
>
> 1. It's vastly more important that the primary key for jobhistory
> incorporates the primary key for employee
Not "you believe", the fact is, Codd states that. And RM adherents appreciate the value of that.
> than that
>
> FOREIGN KEY (man#) REFERENCES employee(man#)
>
> even though the two statements are equivalent.
Er, they are not equivalent by any stretch of the imagination. You need three things.
- The FK you have given above.
- *And* the PK definition: ____JobHistory: PK ( man#, jobdate )
- In which case, your RECORD ID is *REMOVED*, because it is redundant, superfluous, an additional column and index that serves no purpose.
So the truth is you "equivalance" is hardly that, it is not [a][b][c]. Your "equivalence" is [a] and the prohibited [c].
> 2. The very fact that primary key for jobhistory incorporates the
> primary key for employee deserves the special designation of a
> "hierarchical" relationship, perhaps to reflect how the relationship
> would have been designed in pre-relational DBMSs.
I didn't say that, you are painting someone into a corner. Hope it isn't you.
> To support this assertion, you list the keys vertically, and note that
> each longer one incorporates the shorter one above, ergo hierarchy.
> Also you note Codd visually arranged the boxes in a way that suggests a
> hierarchy. You'll forgive me if I find that unpersuasive?!
Well, I did state that, to support another assertion. You have lost the thread and you think I am supporting some other assertion. Read again.
> No, I don't think the fact that one table's primary key is a subset of
> another's is interesting, let alone signficant. It doesn't deserve any
> special designation, "hierarchical" other.
Besides being indisputably hierarchical, whether you agree or not, which are very secondary items, you are missing the primary item, or that which I have been trying to transmit to you. Don't worry about the secondary items, try to understand the primary item: that the formation of the child PK includes the parent PK, and that that is very important to the integrity, power, and speed of the Relational database.
Now, I can demonstrate that (a) those two difference are significant, and (b) more important, the consequences are very significant, which is the reason we have such a difference in the diagrammatic notation. But in order to do that, you are going to have to accept that there *is* a difference, that greater minds than mine determined, as being part of the RM.
Therefore the request is, that you understand my post, and accept that you have three well-used denials, and in order to address them, you have to be able to put those denials aside, and implement Codd's words to the letter, without diverging on the basis of some weird interpretation of the spirit.
Next, in order for me to type less words in explanation, could you please answer: when you implement Codd's tables (we know you are not following his words, you do not have his Fig 3(b) ), do you have RFS 1, RFS 5, or something in-between ?
I have this filed under "don't care" because even
> today there's no model for databases comparable to the relational
> model.
Of course.
> > > It does take some work to read Codd's 1970 paper while trying to
> > > embrace the technological perspective of his audience in the days of
> > > punch cards and drum memory.
> >
> > Nonsense.
> ...
> > In 1976, when I took my first job in a computer service bureau, as an
> > apprentice programmer, we had no drums, no punched cards. We had a
> > machine with one disk (for loading the o/s and programs, not for
> > data), and eight mag tapes (for data).
>
> You arrived a little ahead of me. Doubtless you remember some things
> I've only read about.
>
> In 1976, though, you were already 6 years in, and there were still
> plenty of punch cards around. I programmed with them in college after
> that. When I arrived at work in 1982, we had CICS and 3270 terminals,
> but they had only arrived two years before. My mother in the mid-70s
> programmed on punch cards, too. (One compile per day, taken to the
> computer and back the the unusual RJE mechanism known as a "station
> wagon").
I remember it well. In college, where we also had evening classes for "punch card operators", we used to write our programs on 80-column sheets, which the PC operators would type onto PCs (that was part of their practical), which we would then take to the computer operators, who would compile it overnight, and we would pick up the results in the morning. The big deal was to stay in the computer operations area, to check that the compile worked, such that we could throw out the PCs at the operations room, and save ourselves the effort of carrying the box of PCs around. For exam programs, we were given a maximum of three compiles.
In the US, most toll highways issued PCs at every on-ramp, which were made from recycled PCs, which you submitted to a teller at every off-ramp, until at least 1990 IIRC.
> But, if you thought I meant punch cards were used for
> data storage, no, sorry. I was being allusory.
>
> Granted, "drum memory" is an exaggeration, but not much. The big IBM
> 360 machine sold in 1968 came with 1-4 MB core memory, but many came
> with much less.
Yes. Our 360 had core. That you could see it with the naked eye. Those little magnets NEVER failed. Actually, the HP-2000 had the same. It was the HP-3000 that had IC memory, no core.
360/CICS lived on well into the 90's. You will not believe how many are still running in Aussie and American banks. I have to write transports to/from them.
> It's very easy to imagine people in IT management in those days whose
> knowledge of computer science was nil and whose understanding of
> programming was limited to whatever IBM classes the firm had sent them
> to. No need to imagine them, in fact, because I worked with and under
> some of them. Hurrah for Syncsort [TM]. But it does take some work to
> try to read Codd's paper through their perspective.
>
> > > Codd certainly knew that a tree is a kind of DAG.
> >
> > No. He was a strong proponent of a single Large Shared Data Bank, the
> > classic single-version-of-the-truth. The tree is the hierarchy, the
> > tree is the Relational hierarchy. In a single location. Not a DAG
> > at all. Distributed databases are for the birds, and a DAG is just
> > the latest flavour of birdseed.
>
> I think you did not take my meaning: directed acyclic graph. I can't
> account for your answer otherwise.
I thought you meant, and my previous comments relate to, MS Database Availability Group.
Yes, Codd certainly knew that a tree is a kind of DAG.
But that means you understand something that you claim to not understand. More, later.
> > > I don't know what "normalized [before] RM" refers to,
> >
> > Do you understand that DRY, Agile, etc, is Normalisation for a
> > program ?
>
> If you say so. Not Agile, which is just methodology fetish.
True. Normalisation is the "big secret" behind Agile, DAD. Ambler could not call it Normalisation, because he spent two decades decrying it, propagating the myth that "de-normalisation improves performance."
Same as your teachers. When they implement an hierarchy (that occurs naturally in the data) they call it everything but, "adjacency lists", etc, and they make a right royal hash of it, using two to four times as many keys and indices as a Relational implementation requires.
> I've
> never once heard an application programmer call his data structures
> "normalized", whether or not he knew what the term meant.
- Not just the data structure in programs, but the program elements (dependent on language: routines; subroutines; functions; sub-programs; etc). If they have a diagram of the program or system, the diagram is an hierarchy.
- I didn't say that the programmers call it Normalised, I said it *is* that. Whether they are aware of Normalisation, or whether they have applied it, is a separate matter.
- If they are aware of Normalisation as a science, that they can apply it to their program elements, then their programs are far-better, because they Normalise through the entire exercise, and there is not "normalisation" or "reduction of redundancy" to be done.
- If if they are unaware, then when their programs are slow enough, they go an try to remove some redundancy, while remaining clueless that it is a Normalisation operation.
> > We Normalised very carefully in those days, to eliminate data
> > duplication. We just did not have a formal declaration and name.
>
> OK, so now I know what you mean. But you can minimize redundancy
> without eliminating repeating groups, a requirement for 1NF.
I think you mean 2NF (Normalise repeating groups), not 1NF (Atomic data).
(Date and Darwen have mounted an assault on 1NF, in order to squeeze their imbecility of derived relations into "satisfying" 1NF, they scramble the terms to main tain confusion. Further evidence of their putrescence. I trust you are not doing that.)
No you can't. To achieve 2NF, you have to remove repeating groups, which means placing them in a separate table, which means including an FK, which means a migrated PK. Such tables are 1NF, 2NF, and 3NF.
> And going
> to 1NF for a repeating group means repeating the key, definitely *not*
> minimizing redundancy wrt disk storage.
Not sure what you are saying here. Disk storage then was limited, but
- we did carry the Key in ISAM (pre-HM), as a means of verification, and for rebuilding the file using one single scan, rather than by navigating the pointer chains for the entire file, which would be very slow. - the HM (eg. IMS) carried the key, for the same purpose, the rebuild moved from our code, into their command. - Notice that Codd did not show those carried keys in his example, we didn't normally show them same as we didn't normally show the pointer chain. - such Keys were migrated, in exactly the same way that FKs are migrated in a RDb (but of course, not used as such). - given that it is of value, and it is demanded for the method, in all three cases, the migrated key is not "redundant", it has a purpose that makes it non-redundant.
Now if you are saying that the migrated key in the RDb (as well as ISAM and HM) is "redundant" merely because it can be derived by other means (such as navigating the parent RECORD), you place yourself in the category of Nicola's imbecility, which I have responded to in detail. Darwen had the same imbecility. They miss the point that "related-by-key" demands that the related-key is carried, migrated, wherever it is used as a reference, an FK.
This is tightly related to the fact that Key is not "data"; the theoreticians cannot make that distinction; their non-FD-fragmenting and puzzling is based on denying that distinction.
> Are you going to claim you never used repeating groups in your
> "normalized" HM databases?
What ? Have you not looked at the diagrams I provided ?
The HM, IMS, and the others, all allowed for repeating groups to be handled correctly, ie. normalised, and placed the different record types (tables in todays vernacular) in a separate physical areas of the one file.
For ISAM, which I programmed until the end of the 90's (due to some critical systems never having an HM or RM DBMS), we used separate files for each record type, again, fully Normalised to 3NF, repeating groups were never in the same file as the parent.
So, no, I have never implemented repeating groups incorrectly, in the same record as the parent. I have not seen any other doing that either. I repeat, Normalisation was well understood in those days.
I have only seen it done incorrectly in marginal situations, eg. where it was a temporary fix-up, until the file was rebuilt.
> Surely you know their use was standard
> practice, one that violated no theory.
Nonsense. Details above. Repeating groups implemented incorrectly (ie. not Normalised) would be a gross error. The standard (unless your teachers have a different definition of the word) was to eliminate such gross errors, by simple Normalisation.
Get this, your teachers are disgusting liars, as has been proved hundreds of times. The only case in which repeating groups appeared in the same row as the parent, is the equivalent of:
____SELECT ... FROM parent JOIN child
which of course, is a derived relation (not stored, de-normalised by definition), not a base relation (stored, Normalised).
The gangsters use derived relations to "demonstrate" something, eg. Normalisation or lack thereof, that applies to base relations only. Disgusting. Sub-human.
> So I can accept your defintion for purposes of discussion, but I reject
> it in general because the practice had no theoretical underpinning and
> was a mere suggestion of what we mean by the term today.
No. I have specifically stated in detail, that we applied what is now known as 3NF (Codd's 3NF to you), except for the fact that we did not have the label, Codd's declaration. We just stated "Normalised" or not.
And of course the implementation was limited by the platform, the method.
You need a theoretical underpinning for 3NF (Codd's 3NF) ?
And one for "Normalised" ?
Sheesh.
> > Do you honestly believe that a tree, in the days of the HM and NM,
> > could survive a circular reference ?
>
> No. In fact I would go futher: a tree with a circular reference is not
> a tree. A tree is a kind of directed acyclic graph. A "tree" with a
> cycle is a cyclic graph (directed or not is hard to say).
Agreed, completely.
> By posing the question as you do, I am led to think you're working with
> an informal definition of "tree".
No. First, I was answering your questions re the RM, Codd's words, so the definition of tree in that context is what it meant in 1970.
Second, I agree with your, that Codd understood a tree is a DAG.
Third, I agree that the term tree remains, and is still used today, with more definition, it is a DAG. Todays definition does not contradict yesterdays.
Therefore, I am not using the term by any informal definition, only the formal ones.
> I hope at this point that we understand each others position regarding
> what the so-called hierchical model means, why it's not a "model" in
> the sense of "relational model",
Yes, we don't agree, but we understand each other's position.
Which btw is all I am aiming for. To have my views examined and understood, and still not accepted is fine. I have no intention of convincing you of anything, as stated from the outset.
I never said that the HM was a model in the sense that the RM is a model, that is your pivot, not my argument. Feel free.
> and why I think it's pointless to make
> any claim about a "hierarchy" based on the components of the keys.
Well, that remains open, on the table, waiting for you to answer questions about the state of your RFS, such that I can demonstrate the difference, that you insist is insignificant, is indeed very significant, such that we can then form conclusions. I wouldn't be repeating the conclusion without that gap being closed.
If you refuse to proceed, it does not mean my claim is false, or pointless (I have posted evidence), it means that your argument against it lies without evidence.
> It was an interesting foray into the systems of that bygone era, and I
> think I understand, vaguely why you say that pre-relational systems,
> Cullinet et al., influenced relational ones.
Sure, and you are unwilling to penetrate that vagueness.
The memory lane experience was a consequence of your assertions, and my having to detail why they were invalid. Yes, they were marginal, and avoided the central issues.
> > But I have said much more than that, that the HM is fundamental to
> > the RM. Until you understand Codd's words, you will not see that.
>
> I reject that flatly. Whether or not I can convince you I understand
> Codd's words,
I don't need convincing.
It is you who are unwilling to implement Codd's words.
And you who has some weird interpretation of words.
So whether you understand them or not, as evidenced, you refuse to implement them. So from where I sit, you will never understand them.
> I cannot see any way in which "the HM is *fundamental* to
> the RM". You say there's sound theory in proprietary papers that never
> came to light, lo these 45 years later, despite the immense importance
> of the RM and the ever-present reinvent-the-past interest in graph
> databases today. I can't prove you're wrong. All I can say is I don't
> believe you and won't until I can see for myself.
That is all superficial, and marginal, to the central issues.
You've placed yourself in a double-bind. You cannot ever "see for yourself", if you won't implement his words.
> > Until you understand Codd's words, you will not see that.
You are avoiding dealing with the central issues, by refusing to implement Codd's words, denying various aspects of the RM, and insisting that there is "no significant difference".
The bottom line remains, that what you have been implementing for years as "relational", is substantially less than Relational (proved), and does not have [has only a fraction of] the integrity, power, speed of the RM. Which is easily demonstrated, I am waiting for your answers re my diagrams, in order to proceed.
As a result of the interaction in this and the Normalisation thread, there is a consistent body of evidence (presented by you), that demonstrates that you deny various aspects of the RM, and implement only portions of it.
> > 3. Re your teachers' allegations that hierarchies cannot be
> > implemented in the RM
>
> Au contraire. I said tables can represent graphs, and tree are graphs,
> and hierarchies are trees. Therefore hierarchies can be represented
> relationally. Furthermore, they can be represented simpler, because
> value semantics allow both relation and relationship to be represented
> using one structure.
Er, agreed.
But previously you did state, that they stated that, which is why I responded. Now you are stating the opposite.
Ok, now my [3] is closed.
My [4] remains open:
And all the questions I had, remain unanswered.
Cheers
Derek
Received on Tue Feb 17 2015 - 06:34:06 CET