On Normalisation & the State of Normalisation
Date: Mon, 2 Feb 2015 18:11:03 -0800 (PST)
Message-ID: <dc6823e5-d43c-41c0-8ab7-b7a7d63c8f2f_at_googlegroups.com>
New Thread Normalisation
Dear People
Normalisation has come up in three threads recently, in an indirect but important manner (isn't Normalisation /always/ important?). Let's handle it as a stand-alone subject that the other threads can refer to.
Let's exercise the following context, pretend we have the following roles, so that we can progress through the discussion quickly:
- I am the nominated Database Administrator. The database exists and developers come to me, due to users going to them with user requirements, that cause extensions to the database. The database is 100% Relational, OLTP and DSS/OLAP from the one set of tables, etc. It is precious to the company and their bottom line depends on the quality of the data in the database. The company hired me because I am strictly Relational, strictly Codd, and the auditors signed off the database when I implemented it years ago. Therefore I am the policeman, and nothing goes into the database unless I approve it, and if the data quality is damaged, I will lose my job (not the developer who gave the extension).
- You guys are the nominated theoreticians in the nominally Relational Database space, as the entire Relational Database world knows it. (You may have different "definitions" and "relational" might mean different things to each of you, and that is part of what I expect to have teased out of this.)
- The above is important because the developers are very skilled, they read your books and articles (not your theoretical papers). They know a fair amount of Date, Darwen, Celko, etc. Some of them may have read Pascal. They are now excited about the Alice book, because it is so heavily marketed, but they have difficulty with it. They are excellent SQL coders within the scope that I have given them (and fairly poor outside that). of course, it is 2015, and none of us actually write code, we all have different IDEs for our tasks.
- So when the developers come to me with a proposed extension, they are relying on your books and articles that they have read.
- And finally, since the marvellous internet exists, and google provides c.d.t, I have the opportunity, instead of hammering them for any mistake or substandard proposal, to go to you, the authors, directly, and ask, why do they have this concept, how do you construct or justify this concept in your books. Specifically, re Normalisation, why something does or does not satisfy some NF or other.
In this manner, I expect to (a) emulate the very real, real world problems that happen in thousands or hundreds of thousands of sites, and (b) minimise the discussion re different definitions, etc.
As far as I am concerned, Normalisation is a fixed science, not merely a bag of NF definitions, in fact way beyond that bag, that can be applied easily and directly by an undergrad after a one-semester course. But the reality is, anyone and everyone who attempts Normalisation as it is various known and taught, struggles with it. And they need three semesters.
I laugh at the crackpot NFs that keep coming up. I deliver 1NF, 2NF, 3NF, and DKNF, in a form way, way, beyond the definitions, and I declare that my database will satisfy any NF definition that the theoreticians may come up with in the future. I do this by
- relying on the fullness of the RM,
- and Normalisation as a science (not the bag of NFs). When 4NF and 5NF were published, my declaration was proved to be true, I earned $10,000 for signing a form that stated the db that was previously declared and proven to be 3NF, now "satisfies 5NF", without doing a single piece of work, without executing a single command on the SQL platform. Of course, I bought the entire team lunch, and the auditors dinner.
I also deliver *data quality* that is way, way beyond the fragments of constraints that you guys are aware of, also by relying on the fullness of the RM. But you guys have demonstrated that your notion the RM is fractional, so there is no point in dealing with that, we have to deal with getting you up to the full RM first. Therefore full data quality, constraints beyond DKNF (5NF if you don't like DKNF), the full RM, are beyond the scope of this thread, let's limit this to:
____Normalisation____
____Normalisation wrt to the RM____
and get some clarity and agreement amongst you people, so that the confusion in the real world is reduced. We might end up performing a service to the Relational Database community.
In other threads, as theoreticians in our field, I have laid the responsibility, and therefore the blame for the abject failure, of Normalisation to be understood in the real world, and used, at your feet. That is to say, the developers are relying on the theory as you theoreticians have delivered it. (Sure, there is a lot of garbage on the internet, but I am limiting the scope of this to formal education, which uses your textbooks.)
I am not expecting this thread to be an exposition of my work, the purpose is to clarify and crystalise Normalisation (as a set of NFs or whatever) and reduce the madness that is going on in its absence. Forty five years of confusion and differences in "definitions" is just about enough, don't you think ? That should be enough time for us to come up with a coherent set of definitions. Ie. I expect the content to be delivered by you. Let's try and complete this task this year.
I will say, I agree with most of what James expressed in his instructions to Ruben in the Needs Tutoring thread, but there are some key differences. I say this because it appears most of you disagree with most if not all of what James stated.
Ok. Here is the latest extension that a developer proposed to me, to be added to the RDb.
http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20A.pdf
We need a set of tables to rationalise and consolidate all Addresses. The company sells their gear on the internet as well as Australia, the addresses have to be international. This is what the developer says is the cluster of all address type data that will be used as reference, in order for the data quality in the single Address table to be maintained. The CountryCode and StateCode are ISO 3166-1 and 3166-2, and the CountyCode is ANSI/FIPS, or something meaningful outside America (I already have that loaded elsewhere).
Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.
Cheers
Derek
Received on Tue Feb 03 2015 - 03:11:03 CET