Re: Getting Normal Forms *wrong*. Is there actual disagreement on what 1NF > 3NF means or is this sloppiness or ignorance on authors' parts?

From: Walter Mitty <wamitty_at_verizon.net>
Date: Fri, 07 Aug 2009 15:22:59 GMT
Message-ID: <nFXem.1109$Jg.2_at_nwrddc01.gnilink.net>


"dana" <dana_at_work_at_yahoo.com> wrote in message news:87e06e65-31bd-4c40-b85f-d163cb13894e_at_k26g2000vbp.googlegroups.com...
> Is there "genuine" disagreement on what 1NF, 2NF and 3NF mean? Or is
> the variety of contradictory in-print definitions of Normal Forms due
> to sloppy thinking, slopping writing, or plain old ignorance from
> authors? Also, where can I find an accurate, concise (less than 3
> sentences for each NF), and easy-for-any-IT-person-to-understand,
> definition of Normal Forms 1 through 3?

Many of the differences arise out of sloppy thinking, sloppy writing, or oversimplification for the sake of ease of understanding.

However, there are at least two other sources of differences.

The first is that some writers phrase normal forms as pertaining to tables, whether it's relational tables or SQL tables. The most formal definitions pertain to relations and relvars.

The second is the difference between first normal form as defined by EF Codd and as defined by CJ Date.

>
> My understanding of Normal Forms 1 through 3 is as follows. I look
> forward to being corrected--but hopefully in a polite, constructive,
> professional manner. I'm leaving out any mention of relations and
> relvars here, because I don't expect the typical IT person to know
> what those are (although it can be argued, convincingly I think, that
> they should):
>
> 1NF:
> 1) No multivalued attributes (e.g. every attribute in every
> tuple should be atomic)
> 2) No repeating groups of attributes are allowed (e.g.
> SKILL_1, SKILL_2, SKILL_n are out)
>
> Many common definitions I've read in books on database design miss
> point #1 completely. Is that because #1 is not part of 1NF, because
> the authors are "protecting" the reader, or because of author
> ignorance on what it is for something to be in 1NF?
>

See the difference between EF Codd and CJ Date on point #1. Point #2 deserves more elaboration, which I'll come back to later.

> 2NF:
> 1) Must already be in 1NF
> 2) Every non-key attribute must rely exclusively on the
> key, e.g. no Functional Dependencies.
>
> Here's something I really need clarification on. Is it the case, or is
> it *not* the case, that 2NF "only applies" in cases where there is a
> composite key? This is my understanding, but it may be dead wrong. If
> so, I need to know sooner or later. I have seen this point glossed
> over or missed by various authors, some even using the words "fully
> dependent" when perhaps they should have used "Functionally
> Dependent"; meaning they either don't know what a functional
> dependency is, or they're trying to "protect" the reader (and that may
> or may not be legit given the context and one's proclivities).
>

Note your use of "the key". There are some who would call this "sloppy writing". I am not one of them. A formal definition would refer back to candidate keys and not back to "the key". However, your phrasing is the way I first learned it some 25 years ago.

Point 2 above is misworded. In 2 NF non key attirubtes must depend on the entire key, and not depend on part of the key.

For example: if the key is Student ID and Course ID, and there is an attribute Course Name which is functionally dependent of Course ID, but not on Student ID, then putting these three in one table violates 2NF. In the case of a simple KEY and a relation (or table) in 1NF, conformance to 2NF is automatic.

> 3NF:
>
> 1) Must already be in 2NF
> 2) No transitive dependencies. All non-key attributes must
> rely exclusively on the key.
>
> I would like some concise and clear definitions of functional and
> transitive dependencies, because 2NF and 3NF sound similar to me.
>
> Pet peeve:
>
Transistive dependencies are real different from non full dependencies. I think the examples that go with most explanations should make this clear. The following wording is sloppy, but I hope it clarifies.

A table is not in third normal form if at least one non key column depends on another non key column (or columns). A table is not in second normal form is at least one column does not depend on part of the the key. (If the key has only one part, this can't happen).

> I dislike the mantra, "the key, the whole key, and nothing but the
> key" to characterize Normal Forms 1 through 3. What is evidently
> intended as a helpful mantra seems to give a false sense of
> understanding. I'm all about some mnemonics, but understanding must
> preceed the mnemonic, right?
>

While I understand your qualms about mantras, I do not share them in this case. "the key, the whole key, and nothing but the key (so help me Codd)" is a neat summary of the rules for the first three forms.

> Finally, what are some good books out there on database design for
> neophytes and experienced folks alike that gets the theory correct
> while not getting mired in what relational databases *could* be. I
> read a really great book by Fabian Pascal, the title of which escapes
> me, which I need to go and buy--believe it had the word Practitioner
> in it.
>
> I want to be faithful to relational theory as much as possible while
> getting work done with products currently on the market. I believe
> Fabian Pascal stated that theory was immensely important for practice;
> that it was not about a bunch of curmudgeonly academics bickering
> about how many angels can stand on the head of a pin. Is he correct?
> While it's essential that methodologists and theorists are out there
> visioning what databases can or should be, practitioners have to get
> projects completed with existing resources.
>

One of the mantras that carries a great deal of weight with me is "Theory IS practical" from CJ Date. Having said that, I think that Date's introduction to database systems is more geared to the engineer who proposes to build a new DBMS than to the person who just wants to learn how to design decent databases and develop decent database applications on Oracle, DB2, SQL Server or even (God help me) MS Access. A would be database designer does need to learn the basic concepts, and learning those concepts by trial and error is way too expensive. Far too many seasoned programmers just dive in, only to discover their design flaws after a mountain of code already depends on their naive design.

I'm not going to recommend any of my books to you, because they are way out of date. And i'm not going to recommend any books I have not read. Hopefully, somebody else will help out here.

> Thanks very much.
>
> Dana

Without meaning to berate you at all, I think you still have a great deal to learn. I could be worng.

One thing you might want to learn over the next year or so is when to knowingly violate one of the normal forms in your design, and how to come up with a good design even when you don't follow normaliztion all the way. Another thing you might want to learn is how to design an SQL database that conforms to the relational model, and how to write queries and updates to that database whose results can be forecasted by relational think, in spite of the differences between SQL and "truly relational"..

A third thing you might want to learn is how to keep performance considerations in proper perspective. High performance is one measure of goodness. But there are other measures such as ease of programming, integrity of the data. flexibility for adaping the data to an unforeseen use, etc. etc. Far too many neophytes focus in on performance to the exclusion of everything else. The sad thing is that a lot of those neophytes end up retreating to record-at-a-time processing, and paying a huge performance hit compared to the performance hit they would have suffered with a simple and sound design. Received on Fri Aug 07 2009 - 17:22:59 CEST

Original text of this message