Re: Normalization

From: Scot A. Becker <scotb_at_inconcept.com>
Date: 2000/08/04
Message-ID: <UDHi5.17292$M44.963058_at_typhoon.mn.mediaone.net>#1/1


Hi Jan,

> >I've written a couple of articles on this topic at
> >http://www.inconcept.com/JCM/June1999/becker.html and
> >http://www.inconcept.com/JCM/June1999/becker.html
>
> Nice article. Can you tell me what the difference is between the 5NF and
 the
> PJNF? How is PJNF exactly defined?

Thanks for the kudos. You question, however, is quite tough.

One of the problems I encountered on researching normalization (and attempting to explain it with out using relational algebra <s>) was that the authors I referenced all had a different take on the subject, particularly when it came to the higher forms.

For example, it is quite common to see PJNF referenced as just another name for 5NF.

Terry Halpin disagrees (and he's the only author I have that attempts to distinguish the two). In his book (Conceptual Schema & Relational Database Design, 2nd Ed.) he says, "A table is in 5NF if an only if, for each non-trivial dependency, each projection includes a key of the original table. A table is in PJNF just in case each JD [Join Dependency] is the result of the key constraints. The forms 5NF and PJNF are often treated as equivalent, but some subtle differences can be distinguished to show that PJNF is a stronger notation (Orlowska & Zhang 1992)."

Some definitions:

A projection is as I defined it in the article, and in SQL terms is analogous to SELECT DISTINCT [Some subset of the columns, not all of them] FROM... A Join Dependency (JD) exists if two or more projections of a table can be joined to produce the original table. For example, the table EMPLOYEE(_EMP#_, birthdate, sex) has the projections EMP1(_EMP#_, birthdate) and EMP2(_EMP#_ sex). These projections can be joined to produce the original EMPLOYEE table. [In the above notation, the columns preceded and followed by the underscore ( _ ) indicate the key.]

However, this isn't a good example of 5NF because you only need to worry about it when the table has three or more columns and all columns participate in the primary key.

Thus, these normal forms are often referred to as "Business Rule Normal Forms" because a given all key table may or may not be in 5NF/PJNF depending on what other rules exist (as per my article).

If the projections of this all key table can be re-joined to produce the same table, the table violates these forms and it is possible to split them further (namely, as the projections).

To be pragmatic, however, I haven't seen this case in the "real world".

Graeme Simsion says this happens most often when using intersection entities to resolve many-to-manys between entities. I don't model in ER, so I don't have this problem <eg>.

Your question got me curious, however, so I called around to a couple of folks who would know. The responses can be summarized as: "un-important", "trivial", and "splitting hairs".

The paper Halpin cited is: Orlowska, M. & Zhang, Y. 1992, 'Understanding fifth normal form (5NF)', Australian Computer Science Communications, vol. 14, pp. 631-9

I'm still curious, so I'm going to try to track this down (what can I say, I'm a geek). If I can find it, I'll post a link.

I realize that I probably didn't help much, but... I'd say, don't worry about it in practice; I suspect the distinction is relevant only to relational theory.

HTH,
Scot.



Scot A. Becker

Principal Consultant, InConcept, Inc.

     http://www.inconcept.com

Editor, The Journal of Conceptual Modeling

     http://www.inconcept.com/JCM Received on Fri Aug 04 2000 - 00:00:00 CEST

Original text of this message