Re: 2NF There are two Definitions which is right - amended

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Fri, 22 Jul 2005 08:11:06 GMT
Message-ID: <42E0AA14.5000409_at_earthlink.net>

Jonathan Leffler wrote:
> I've included most of Jens original question and rather more of my
> initial response to Jens question in this response, so that the
> discussion is more nearly complete in one message.

Darn it - I review and review and still the obvious errors escape!

> Jens Haase wrote:

>> Jonathan Leffler wrote:
>>> Jens Haase wrote:
>>>> I found two definitions for 2NF:
>>>>
>>>> 1: A relation R(A,F) is in 2NF, when every attribute not
>>>> belonging to the primary key of R is fully functionally dependent
>>>> on the primary key of R
>>>
>>>
>>> Can you give a hint as to the significance of the parenthesized 
>>> "(A,F)"; neither A nor F is referenced in the definition quoted. I 
>>> don't think it alters the answer, but I'm curious. (The comma after
>>> 2NF should be omitted too.)

>
> A beter version of this definition would be "A relation R is in 2NF when
> every attribute that is not part of a candidate key of R is fully
> functionally dependent on the candidate key(s) of R".
>

>> R(A,F) means a Relation R with attributes A and functional
>> dependencies F

>
> Fine - as expected, this makes no difference to the answer.
>

>>>> 2: A relation schema R is in 2NF if every nonprime attribute A in
>>>> R is fully functionally dependent on the primary key of R
>>>
>>> Assuming 'nonprime attributes' are the attributes that do not
>>> belong to the primary key, it seems to me that the two definitions
>>> are the same.
>> 
>> This definition (the second) is from Elmasri/Navate Fundamentals of 
>> Database Systems Second Edition Page 411.
>> On Page 408 in the same book the authors state:
>> 
>> An attribute of relation schema R is called a prime attribute of R if
>> it is a member of some candidate key of R. An attribute is called
>> nonprime if it is not a prime attribute—that is, if it is not a
>> member of any candidate key.

>
> It certainly helps to know the extra background. As I noted in a
> separate response to Jon Heggland's comment, neither of the definitions
> clearly allows for multiple candidate keys, though given this definition
> of non-prime attribute, you can argue that this version does. However,
> even so, it would be IMNSHO be better phrased if it finished 'dependent
> on each candidate key of R'.

One of the 'be's on either side of "IMNSHO" is, IMNSHO, redundant.

>> A superkey [...]

>
> Superkeys are not relevant to most of this discussion, I think.
>

>> If a relation schema has more than one key, each is called a
>> candidate key. One of the candidate keys is arbitrarily designated to
>> be the primary key, and the others are called secondary keys

>
>
> Secondary key is not a common term for the non-primary candidate keys
> (though it is more logical in some respects than 'alternative key',
> which is the term I'm more familiar with). These days there is no
> formal reason to distinguish one candidate key as the primary key and
> the others by another name. Normalization theory is best defined in
> terms of candidate keys (see Date 'The Primacy of Primary Keys: An
> Investigation' in 'Relational Database: Writings 1991-1994').
>

>> According to this definition there is a difference between the two 
>> definitions.

>
> Just about - and the second is the better definition, but not perfect.
>

>>>> According to the first definition the following schema would not
>>>> be in 2NF, according to the second it would be:
>>>>
>>>> R = (A,B,C,D)
>>>>
>>>> with:
>>>> A->D
>>>> AB->C
>>>> C->D
>>>> D->A
>>>>
>>>> According to the first definition when the primary key is AB, then
>>>> A->D violates 2NF, because D is not part of the primary key.
>>>
>>> I agree that AB is the primary key - or at least a valid primary key.

>
> AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
> (union); and AB->ABCD (reflexivity); and therefore AB determines all
> attributes of R, which makes it a candidate key.
>

>>> A->D itself cannot violate 2NF; it is a functional dependency, and 
>>> relation schemas are what satisfy or violate 2NF.  And the reasoning 
>>> the that D is not part of the primary key is bogus too.

>
> Clearly, 'the that' is a typo - sorry. The first sentence is logic
> chopping, and may not be fair since I suspect English is not Jens's
> first language; nevertheless, it is also accurate. "According ... AB,
> then the FD A->D means that R is not in 2NF" would be concise - but
> wrong. However, the trouble is that the first definition precludes the
> possibility of multiple candidate keys, and therefore cannot be applied
> to this relation schema where, as shown below, there are 3 candidate
> keys. If the first definition is adapted to fix the problem, it becomes
> equivalent to the second, and the relation R is in 2NF.
>
> I don't think my second sentence makes as much sense as I'd like it to.
> However, the reason that R is not in 2NF is not because of the status of
> D but because of the status of A -- A on its own is not a candidate key,
> and the fact that A alone determines D is what prevents R from being in
> 2NF.

My 'However' sentence is just wrong. R is in 2NF (as stated at the end of the preceding paragraph). You can tell I did some circular editing.

>> A->D tells us that D is only dependant on A and so only on a part of 
>> the primary key.

>
> Yes. (And this is a change in stance from yesterday's post.)
>

>>> As I see it, AB->C and C->D implies that AB->D so the primary key 
>>> determines both C and D, as required.  Granted, it is neither in 3NF 
>>> nor BCNF - the mnemonic is "the key, the whole key, and nothing but 
>>> the key", and the functional dependency A->D means that 'the whole 
>>> key' part of the rule is violated. (The C->D part is also 
>>> problematic, but doesn't cause the relation schema to violate 2NF.)

>
> By implication, yesterday I said "but R is in 2NF"; today, I think that
> was wrong.
>

>> According to Ullmans definition of 3NF:
>> A relation R is in third normal form in whenever X -> A holds in R
>> and A is not in X, then either X is a candidate key for R, or A is
>> [a] prime attribute
>>
>> it is in 3NF since:
>>
>> A->D holds 3NF since D is prime attribute
>> AB->C holds 3NF because AB is candidate key
>> C->D holds 3NF because D is prime attribute
>> D->A holds 3NF because A is prime attribute

>
>
> Yes. To understand this, we need to establish that AB, BC and CD are
> candidate keys - which is discussed below. One of the strengths of BCNF
> over 3NF is its greater simplicity.
>
> Date summarizes this nicely (An Introduction to Database Systems, 7th
> Edn, p375):
>
> 3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
> the attributes of R, and let A be any single attribute of R. Then R is
> in 3NF if and only if, for every FD X->A in R, at least one of the
> following is true:
> 1. X contains A (so the FD is trivial);
> 2. X is a superkey;
> 3. A is contained in a candidate key of R.
>
> BCNF is obtained from this definition of 3NF by dropping condition 3.
> (And I note that superkeys make an appearance here.)
>
> This does not depend on the definition of 2NF; some other
> characterizations of 3NF have an "if the relvar [relation] is in 2NF and
> ..." structure.
>
> Since every single attribute on the RHS of an FD in your set satisfies
> requirement 3, R must be in 3NF. It is surely not in BCNF, though.
>
> [
> Somewhat tangentially - at p400, Date gives the following neat
> characterization of BCNF, 4NF and 5NF:
>
> BCNF: A relvar is in BCNF if and only if every FD satisfied by R
> is implied by the candidate keys of R.
>
> 4NF: A relvar is in 4NF if and only if every MVD satisfied by R
> is implied by the candidate keys of R.
>
> 5NF: A relvar is in 5NF if and only if every JD satisfied by R
> is implied by the candidate keys of R.
> ]
>

>>>> According to the second definition R is in 2NF because the
>>>> candidate keys are AB, BC, BD and since that, every attribute is
>>>> prime attribute, so there is no violation of 2NF.
>>>
>>>
>>> Now you introduce a term - candidate key - not mentioned in the 
>>> definitions of 2NF that we're dissecting. I would dispute that BC
>>> is a candidate key anyway; likewise BD. Given a relation schema R {
>>> A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
>>> by the rules of trivial functional dependencies (or FDs),
>>> AB->ABCDE). That is, a candidate key functionally determines every
>>> attribute in the relation schema. How do you apply Armstrong's
>>> Axioms to the relation schema and list of FDs to deduce that
>>> BC->AD?
>> 
>> Well since C->D and D->A with Armstrongs 3. Axiom we get C->A that
>> gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC is
>> a candidate key.
>> 
>> since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
>> Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a 
>> candidate key.

>
> OK; I sit corrected - twice.
>
> So, all the attributes are prime attributes, which means that the
> relation is in 2NF by either definition (or, at least, by either
> definition as amended) because both definitions only refer to non-prime
> attributes and your relation schema has no non-prime attributes.
>

>> The [w]hole problem i am facing, is that the first definition is from
>> a german script and the second is from an english book
>> (Elmasri/Navathe) and my thoug[h]t is, that there is a confusion
>> between "Primary key" and "prime".

>
> I still think the difference between the two definitions is mainly in
> that the first implicitly assumes that there is just one candidate key
> and the second implicitly permits there to be multiple candidate keys,
> and therefore the second definition is more general and better.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Received on Fri Jul 22 2005 - 10:11:06 CEST

This message: [ Message body ]
Next message: dawn: "Re: Implementation of boolean types."
Previous message: Dan Guntermann: "Re: 2NF There are two Definitions which is right"
In reply to Jonathan Leffler: "Re: 2NF There are two Definitions which is right"
Next in thread: Jon Heggland: "Re: 2NF There are two Definitions which is right"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message