Re: 2NF There are two Definitions which is right

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Fri, 22 Jul 2005 07:59:44 GMT
Message-ID: <42E0A76A.9080204_at_earthlink.net>


I've included most of Jens original question and rather more of my initial response to Jens question in this response, so that the discussion is more nearly complete in one message.

Jens Haase wrote:
> Jonathan Leffler wrote:

>> Jens Haase wrote:
>>> I found two definitions for 2NF:
>>>
>>> 1: A relation R(A,F) is in 2NF, when every attribute not
>>> belonging to the primary key of R is fully functionally dependent
>>> on the primary key of R
>>
>> Can you give a hint as to the significance of the parenthesized 
>> "(A,F)"; neither A nor F is referenced in the definition quoted. I 
>> don't think it alters the answer, but I'm curious. (The comma after
>> 2NF should be omitted too.)

A beter version of this definition would be "A relation R is in 2NF when every attribute that is not part of a candidate key of R is fully functionally dependent on the candidate key(s) of R".

> R(A,F) means a Relation R with attributes A and functional
> dependencies F

Fine - as expected, this makes no difference to the answer.

>>> 2: A relation schema R is in 2NF if every nonprime attribute A in
>>> R is fully functionally dependent on the primary key of R
>> 
>> Assuming 'nonprime attributes' are the attributes that do not
>> belong to the primary key, it seems to me that the two definitions
>> are the same.

>
> This definition (the second) is from Elmasri/Navate Fundamentals of
> Database Systems Second Edition Page 411.
> On Page 408 in the same book the authors state:
>
> An attribute of relation schema R is called a prime attribute of R if
> it is a member of some candidate key of R. An attribute is called
> nonprime if it is not a prime attribute—that is, if it is not a
> member of any candidate key.

It certainly helps to know the extra background. As I noted in a separate response to Jon Heggland's comment, neither of the definitions clearly allows for multiple candidate keys, though given this definition of non-prime attribute, you can argue that this version does. However, even so, it would be IMNSHO be better phrased if it finished 'dependent on each candidate key of R'.

> A superkey [...]

Superkeys are not relevant to most of this discussion, I think.

> If a relation schema has more than one key, each is called a
> candidate key. One of the candidate keys is arbitrarily designated to
> be the primary key, and the others are called secondary keys

Secondary key is not a common term for the non-primary candidate keys (though it is more logical in some respects than 'alternative key', which is the term I'm more familiar with). These days there is no formal reason to distinguish one candidate key as the primary key and the others by another name. Normalization theory is best defined in terms of candidate keys (see Date 'The Primacy of Primary Keys: An Investigation' in 'Relational Database: Writings 1991-1994').

> According to this definition there is a difference between the two
> definitions.

Just about - and the second is the better definition, but not perfect.

>>> According to the first definition the following schema would not
>>> be in 2NF, according to the second it would be:
 >>>
 >>> R = (A,B,C,D)
 >>>
 >>> with:
 >>> A->D
 >>> AB->C
 >>> C->D
 >>> D->A
 >>>
 >>> According to the first definition when the primary key is AB, then
 >>> A->D violates 2NF, because D is not part of the primary key.
 >>
>> I agree that AB is the primary key - or at least a valid primary key.

AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD (union); and AB->ABCD (reflexivity); and therefore AB determines all attributes of R, which makes it a candidate key.

>> A->D itself cannot violate 2NF; it is a functional dependency, and 
>> relation schemas are what satisfy or violate 2NF.  And the reasoning 
>> the that D is not part of the primary key is bogus too.

Clearly, 'the that' is a typo - sorry. The first sentence is logic chopping, and may not be fair since I suspect English is not Jens's first language; nevertheless, it is also accurate. "According ... AB, then the FD A->D means that R is not in 2NF" would be concise - but wrong. However, the trouble is that the first definition precludes the possibility of multiple candidate keys, and therefore cannot be applied to this relation schema where, as shown below, there are 3 candidate keys. If the first definition is adapted to fix the problem, it becomes equivalent to the second, and the relation R is in 2NF.

I don't think my second sentence makes as much sense as I'd like it to. However, the reason that R is not in 2NF is not because of the status of D but because of the status of A -- A on its own is not a candidate key, and the fact that A alone determines D is what prevents R from being in 2NF.

> A->D tells us that D is only dependant on A and so only on a part of the
> primary key.

Yes. (And this is a change in stance from yesterday's post.)

>> As I see it, AB->C and C->D implies that AB->D so the primary key 
>> determines both C and D, as required.  Granted, it is neither in 3NF 
>> nor BCNF - the mnemonic is "the key, the whole key, and nothing but 
>> the key", and the functional dependency A->D means that 'the whole 
>> key' part of the rule is violated. (The C->D part is also problematic, 
>> but doesn't cause the relation schema to violate 2NF.)

By implication, yesterday I said "but R is in 2NF"; today, I think that was wrong.

> According to Ullmans definition of 3NF:
> A relation R is in third normal form in whenever X -> A holds in R
> and A is not in X, then either X is a candidate key for R, or A is
> [a] prime attribute
>
> it is in 3NF since:
>
> A->D holds 3NF since D is prime attribute
> AB->C holds 3NF because AB is candidate key
> C->D holds 3NF because D is prime attribute
> D->A holds 3NF because A is prime attribute

Yes. To understand this, we need to establish that AB, BC and CD are candidate keys - which is discussed below. One of the strengths of BCNF over 3NF is its greater simplicity.

Date summarizes this nicely (An Introduction to Database Systems, 7th Edn, p375):

3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of the attributes of R, and let A be any single attribute of R. Then R is in 3NF if and only if, for every FD X->A in R, at least one of the following is true:

  1. X contains A (so the FD is trivial);
  2. X is a superkey;
  3. A is contained in a candidate key of R.

BCNF is obtained from this definition of 3NF by dropping condition 3. (And I note that superkeys make an appearance here.)

This does not depend on the definition of 2NF; some other characterizations of 3NF have an "if the relvar [relation] is in 2NF and ..." structure.

Since every single attribute on the RHS of an FD in your set satisfies requirement 3, R must be in 3NF. It is surely not in BCNF, though.

[
Somewhat tangentially - at p400, Date gives the following neat characterization of BCNF, 4NF and 5NF:

BCNF: A relvar is in BCNF if and only if every FD satisfied by R

       is implied by the candidate keys of R.

4NF: A relvar is in 4NF if and only if every MVD satisfied by R

       is implied by the candidate keys of R.

5NF: A relvar is in 5NF if and only if every JD satisfied by R

       is implied by the candidate keys of R. ]

>>> According to the second definition R is in 2NF because the
>>> candidate keys are AB, BC, BD and since that, every attribute is
>>> prime attribute, so there is no violation of 2NF.
>> 
>> Now you introduce a term - candidate key - not mentioned in the 
>> definitions of 2NF that we're dissecting. I would dispute that BC
>> is a candidate key anyway; likewise BD. Given a relation schema R {
>> A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
>> by the rules of trivial functional dependencies (or FDs),
>> AB->ABCDE). That is, a candidate key functionally determines every
>> attribute in the relation schema. How do you apply Armstrong's
>> Axioms to the relation schema and list of FDs to deduce that
>> BC->AD?

 >
> Well since C->D and D->A with Armstrongs 3. Axiom we get C->A that
> gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC is
> a candidate key.
 >
> since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
> Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a
> candidate key.

OK; I sit corrected - twice.

So, all the attributes are prime attributes, which means that the relation is in 2NF by either definition (or, at least, by either definition as amended) because both definitions only refer to non-prime attributes and your relation schema has no non-prime attributes.

> The [w]hole problem i am facing, is that the first definition is from
> a german script and the second is from an english book
> (Elmasri/Navathe) and my thoug[h]t is, that there is a confusion
> between "Primary key" and "prime".

I still think the difference between the two definitions is mainly in that the first implicitly assumes that there is just one candidate key and the second implicitly permits there to be multiple candidate keys, and therefore the second definition is more general and better.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/
Received on Fri Jul 22 2005 - 09:59:44 CEST

Original text of this message