Re: Fails Relational, Fails Third Normal Form
Date: Tue, 10 Feb 2015 23:06:25 -0800 (PST)
Message-ID: <ca302678-a51f-4c3b-b65a-191bf7bf98f5_at_googlegroups.com>
- JKL
Thank you for your response.
It is a bit hard to respond to this new post here, as my head is stuck in the Hierarchical Model thread, waiting for the several loose ends to be tied up.
> On Monday, 9 February 2015 10:59:35 UTC+11, James K. Lowden wrote:
> On Sat, 7 Feb 2015 17:46:38 -0800 (PST)
> Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:
> I think this exchange illustrates a difference in tradition that you
> feel is idiotic but is really just a question of what one assumes.
>
> From your point of view, you have a customer and a system that
> maintains addresses in a particular place and time. The columns have
> meaning (exemplified in their names) and the keys more or less announce
> themselves to you.
Yes, please! They jumped out at me.
> Any ambiguity or error can be addressed by
> discussing them with your customer. I.e., by agreeing on their
> meaning.
Sure. And from the outset, I have stated that I am the DBA, the policeman. I can answer any of your questions re meaning or anything else. But you have to ask them, I have no idea what you do or do not know, noting that they pretty much "announce themselves to you".
> Any formalism with [non-]FDs is pointless.
Yes.
Note my insertion, you guys do not use the FD definition, you have a fragment of it. I think you mean that fragment, not the real thing, and you use the fragment in a backwards or bottom-up process. The real thing employs a process of determining the Keys first, and testing them for validity using the FDs, second, which is an integral part of [cannot be divorced from] the Normalisation process. Ie. forwards, top-down.
> From the academic point of view -- indeed from the point of view of
> the DBMS, as you know -- no column has meaning.
Totally disagree. When you say "DBMS", you may be meaning "theoretical DBMS", in which case, I don't agree or disagree, you are welcome to entertain it as a theoretical concept, if such is valid. And sure, for your theoretical purposes, you have abstracted all meaning out of all the elements (not just columns! not just the column names !!). Which, as evidenced, is a serious impediment.
In the case of a DBMS, or a RDB, every column has meaning. And the Key columns have very important meaning (due to the structure of any DB is defined by the relationships; and in an RDB, those relationships are Keys; thus the structure is defined by Keys). I have detailed most of that meaning in my post of 11 Feb 10:09 to Nicola, in response to his commendable attempt to model Köhler's non-relational data, Relationally.
If you are going to model data, Relationally, you cannot afford to dismiss the meaning; if you are going to determine Keys (the current task) it is doubly important.
Ok, I agree, the current task is not academic, it is not theoretical, it is a practical requirement. The only theory that applies, is Codd's. I think you acknowledged that elsewhere:
>>>>
One giant leap owed to Codd that I think was (and often still is)
underappreciated is his adoption of value semantics. Your helpful
citation illustrates that point quite well ...
<<<<
> It has a type and
> domain, and some relationship (perhaps functional dependency) on other
> columns. Any statement about keys *must* be based on stated [non-]FDs.
Fine, but that theory (which may be valid theoretically) does not apply to this practical task. Or, you can apply it, and it will fail miserably. Or, all your theories put together, sum up to a grand total of zero that can be applied to the practical task.
Therefore, feel free to apply them, but the evidence is, total failure to determine that the proposal Address [A][B] is non-relational; that it breaks 3NF; total failure to determine the Keys. I don't think you can expect a different result, unless you employ a different technique, a different set of principles and methods.
Of course, it is no secret, we implementers do not employ your theoretical methods (if we did, we would get the same results). We employ practical methods. Codd's Relational Model. Codd's 3NF and FD definition.
> Your corrective notes end with what really is all that need be said,
>
> > * Find out what the data is, what it means, how it relates to
> > all other data in this cluster.
Yes. And I detailed that further in my next response [B] to him.
> That is an option open to your group and not generally to c.d.t..
> Anyone here willing to make an assertion about the correctness of the
> model must also be willing to make assumptions about the meaning of the
> columns. However safe those assumptions might be, they are still only
> assumptions.
>
> Surely you agree that to be unwilling to make assumptions like that need
> not be an exercise in stupidity or obfuscation.
Ok, thanks for pointing that out. Apparently I was not clear enough when I set out the roles.
But no, it doesn't apply here. The task is a practical one. You are free to have assumptions; to test those assumptions by asking me questions, and pondering my answers. I would never suggest a task that you could not perform.
Here, I am your "customer", I am your "users", whom you people can "go to" to have any and all questions answered. I am at your beck and call. Note that I cannot turn around and say that some proposal from one of you is incorrect, if it fulfils the requirement that I have given. The task is not to fulfil the requirement of a customer who cannot be reached.
Eg. I had no idea that anyone (implementer or theoretician) would ever think that ISO 3166-2 should be taken literally, or that they would break 1NF without being conscious of doing so. When I saw that I addressed it. I couldn't have known that earlier, not in a million years.
I am not saying the theoreticians in the RDB space are stupid because they have assumptions and can't proceed, etc. - I am saying the theoreticians in the RDB space are stupid because they are using a hammer for a task that calls for an axe. ==AND== they will not observe the evidence that the hammer is not working, that it is not suited to the job. ==AND== the are ignorant (or in denial) that axes exist.
> An example of the difficulty arising from unclear meaning is the
> discussion over the relationship of postal code to unit. I would never
> have guessed there are jurisdictions in which they are 1:1, but you
> said (IIUC) that there are.
I said, AFAIK, the finest granularity of post code was 1:1 with an entrance to a building, not 1:1 with an Unit (apartment, suite, townhouse).
> Other questions arise, too. When I looked
> at
> http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf
> I found myself wondering about StreetName and StreetType. I couldn't
> think of an application for which those tables would be useful. They're
> not objectively wrong, but I assume they are.
- They are classic Reference tables, used to constrain StreetNames and StreetTypes in Street, to valid values.
- We use Names in a way that is beyond the scope of this exercise here, so you can safely ignore further info re those two tables. Eg. from your days with Sybase 10 (1993 IIRC) you might know about our SOUNDEX() function, which is used to detect spelling errors, etc. We want to prevent "Warshinton", when "Washington" is in the database.
- In addition to being Reference tables, they are used for Search purposes, to determine valid vectors or Dimensions, thus avoiding: IF EXISTS ( ____SELECT DISTINCT StreetName ________FROM Street ________)
Where (a) Street is a very large table that should not be scanned during production if it can be avoided, (b) certainly not for a Dimension, and (c) the Reference table exists.
> I remember a different example in my work. We had two tables,
> Countries, and CountryGroups. Each Country had and ISO code and was
> the real deal. CountryGroups reflected various political designations
> and business imperatives.
>
> One fine day a developer wrestling with an "application problem" (his
> term) asked for comment on his preferred solution: to add one row to
> Countries named "all countries". (You can imagine my reaction and I
> yours!) I would say the suggestion stemmed fundamentally from a failure
> to understand the meaning of the Countries table. To the man in
> question, the table had no meaning per se, and the "missing" row was a
> deficiency. I suggested, colorfully, that the concept of "all
> countries" belonged squarely in CountryGroups. Obvious as that may be
> to you, it took quite a lot of persuasion to prevent corrupting a basic
> domain table. Meaning is surprisingly hard to pin down.
Understood. And that story emphasises that meaning is very important, and that it must be documented. Eg. the purpose and content of Country vs CountryGroup tables.
> LIke you, I learned about 3NF from an informal description. I don't
> know how many treatises I've read describing an algorithm based on FDs;
> they all read to me like the How to Hunt Elephants
> (e.g., "COMPUTER SCIENTISTS"
> http://paws.kettering.edu//~jhuggins/humor/elephants.html): sure to
> succeed if ever it finished, and unnecessary in my context.
Glad to hear that. But it concerns me, that you use the term FD interchangeably, whilst knowing full-well that the real FD and the theoretical one are quite different, the latter being only a fragment of the former. Eg. they can't be used interchangeably, each has a different purpose.
> Unlike you, I don't think the FD formalism is an exercise in navel
> gazing.
I didn't say quite that, but I accept that you picked that up from my comments.
I said:
a. that it cannot be used for the given purpose (determine Keys when they are jumping out at you, as per this task), or for the normal determination of Keys during the exercise of data modelling, so there is no point is using it in those scenarios. b. that for the given purpose, another method exists c. that they are using it anyway, which is silly, given [a][b] d. they still haven't produced anything by that method e. but that it remains a valid method for determining keys when a human is not present to perform the analysis of data.(eg. I am not saying it has no use; the use has not been defined to me; I leave that open)
Note that at Nicola's challenge, I took up a problem that was more complex (Köhler's, five key elements, more the the four key elements Nicola suggested), and I Determined the Keys on data I had never seen before, using Codd's method. In less than 30 mins.
Note that Nicola was stuck in the non-FD non-key rut, probably no less and no more than the rest of you. But when he started his model, he jettisoned it, just dealt with the data, and tried to form keys on his own. He didn't need to be told that non-FD non-key method is irrelevant, or that "the Key has meaning, retain it". He is busy with a serious level of modelling way, way past you guys.
> Would only that the described algorithm were implemented, and
> we could pour our column definitions in and get a 3NF (say) logical
> model out!
I think you mentioned pipe dreams.
To a practical person, that would be interesting, not desirable. But the sequence is entirely without merit, a scientific void.
Further, since 1985, we have had CASE tools such as (just one example) ERwin which implements the RM and IDEF1X. We work forwards and backwards from visual graphical tools, to/from the database [any platform] directly, or to/from DDL, with a single push of a button. So the non-FD determination of non-keys method, bottom-up (ours is top-down) is irrelevant to us. Same ass the Tutorial D monolith.
Obviously, the CASE tools have expanded, they have gained a lot of maturity, in that thirty years.
> The problem as I see it isn't in the formalism per se, but in
> describing the columns' meaning to the algorithm. You do that in your
> head and depict the result with IDEF1X.
Sure, I do it in my head, but the method is scientific, defined, by Codd. Anyone can do it.
I can't comment on the formalism, or the value of it, because no one has defined that to me. What I have seen, as you can tell from this thread, is usage of it in a scenario that is most inappropriate. And of course, that is commonly done: Jan does it; Köhler does it. So I would say, they think it is THE method, they do it in every instance, including the inappropriate. They are unaware that it cannot be used in the normal scenario (human, analysing data). They are unaware that a simpler, faster method exists.
I am not saying you haven't, I just haven't seen any of it. From what I have seen of your work, it is RFS, not Relational.
> Are you
> prepared to say that's the last and best way? I'm not.
I am. And it is worse than that. I am saying it is the only method for implementers, for practical people, for humans. Comparatives and superlatives can't be used when there is only one choice, but if we overlook that grammatical rule, then sure, it is the best, simplest, and fastest method. Full stop.
I am open to the possibility that the non-FD non-key Determination method has a use, and only in the theoretical arena, if and when they define that use to me. Until then, noting that I am right in every case that has been put to me thus far, the Codd method destroys it, makes it irrelevant, even in the theoretical arena.
Look, if you can't look at the nutty professor's example on the board, and determine that the SSN is the key, if you can't look at Address B and determine that there is no key possible in the table State as given, then there is a serious problem. You have abstracted your self out of the picture, so so far out, that you have lost the ability to see the meaning that does exist in the words that are given.
So, once that is lost, then sure, play with 1's and 2's and a's and b's, and non-FD non-keys, and some day you might, just might, find a way.
But that does not address the problem, which is a compulsion to abstract the meaning in anything, out of it, and thus, cripple oneself, prevent oneself, to perform the very task that one is supposed to be performing.
It is not about comparing the Codd Key-FD method vs the non-FD-non-key method, because they are not comparable.
> I'm still
> waiting for an FD language (loosely speaking) that will describe my
> database better than SQL, from which I can generate an IDEF1X diagram
> and matching SQL DDL. That would be a better way to work, and would be
> fruit from FD tree.
Ok.
Meanwhile, for the last thirty years, back at the farm, we have been doing:
> describe my
> database better than SQL,
You have that backwards. SQL is a data sub-language. It is implementation level. It is not a high-level or abstract language, or even a full langauge. It is not appropriate for use as describing or defining a database at any level that is higher than the context of an implementation in a specific platform.
Would you use BASIC-PLUS or awk to describe a database ? No ? Good. Then why would you use SQL to do it ?
If you use a hammer, for a job that requires an axe, you will fail. You cannot then turn around and say, the hammer is not working, it is broken. Use an axe.
Separately, there is a huge difference between commercial SQLs and the freeware/shareware/vapourware. What is impossible today on the freeware has been possible in the commercial ware for over thirty years. Recursion, deployed in the right manner. Full OLTP and full ACID Transactions and standards to go with it.
We describe the database in a data model, diagrammatically (ie. not text strings), using a CASE tool, and ==TELL== SQL what it is, by pushing the "publish data model to SQL database" button. Of course, it has version control, diff, transports between platforms, etc. Of course, the tool allows us to add notes to every element in the model, so when we print the reports, it has all that in documentary form.
I would never expect the SQL implementation to describe it back to me.
Ok, there is one exception. If I go into a customer site, and find that they have zero documentation, or I find that they have lied to me about something in the SQL implementation, then sure, I press the "reverse engineer data model from SQL database" button, and obtain it. That is an exceptional case, the once-off starting point for a new project. And even then, I cannot reasonbaly expect anything more than the limits of SQL, the platform.
I would never expect the awk script to describe itself back to me. What the awk script does, is described using SSADM and a IDEF1X data model. Same as any app, SQL or not.
> from which I can generate an IDEF1X diagram
If necessary, we can do that. One "reverse engineer" button press.
> and matching SQL DDL
One button press "publish data model to SQL DDL files". Radio buttons to choose platform, etc, etc.
> that will describe my
> database better than SQL
I suspect you might mean that there is something you can define re RDBs, that is relevant to an RDB (in whatever language you imagine), that you think cannot implemented in SQL. That is incorrect. Please give me an example, and I will show you how to implement it in SQL.
I am quite willing to do that here. Please give me an example of what SQL cannot do. Re RDBs.
> That would be a better way to work, and would be
> fruit from FD tree.
Only if you refuse to visit the orchards that we have had for thirty years. Or you deny that they exist.
Thanks again for your post.
Cheers
Derek
Received on Wed Feb 11 2015 - 08:06:25 CET