Re: 2NF There are two Definitions which is right - amended
From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Fri, 22 Jul 2005 08:11:06 GMT
Message-ID: <42E0AA14.5000409_at_earthlink.net>
>
> A beter version of this definition would be "A relation R is in 2NF when
> every attribute that is not part of a candidate key of R is fully
> functionally dependent on the candidate key(s) of R".
>
>
> Fine - as expected, this makes no difference to the answer.
>
>
> It certainly helps to know the extra background. As I noted in a
> separate response to Jon Heggland's comment, neither of the definitions
> clearly allows for multiple candidate keys, though given this definition
> of non-prime attribute, you can argue that this version does. However,
> even so, it would be IMNSHO be better phrased if it finished 'dependent
> on each candidate key of R'.
>
> Superkeys are not relevant to most of this discussion, I think.
>
>
>
> Secondary key is not a common term for the non-primary candidate keys
> (though it is more logical in some respects than 'alternative key',
> which is the term I'm more familiar with). These days there is no
> formal reason to distinguish one candidate key as the primary key and
> the others by another name. Normalization theory is best defined in
> terms of candidate keys (see Date 'The Primacy of Primary Keys: An
> Investigation' in 'Relational Database: Writings 1991-1994').
>
>
> Just about - and the second is the better definition, but not perfect.
>
>
> AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
> (union); and AB->ABCD (reflexivity); and therefore AB determines all
> attributes of R, which makes it a candidate key.
>
>
> Clearly, 'the that' is a typo - sorry. The first sentence is logic
> chopping, and may not be fair since I suspect English is not Jens's
> first language; nevertheless, it is also accurate. "According ... AB,
> then the FD A->D means that R is not in 2NF" would be concise - but
> wrong. However, the trouble is that the first definition precludes the
> possibility of multiple candidate keys, and therefore cannot be applied
> to this relation schema where, as shown below, there are 3 candidate
> keys. If the first definition is adapted to fix the problem, it becomes
> equivalent to the second, and the relation R is in 2NF.
>
> I don't think my second sentence makes as much sense as I'd like it to.
> However, the reason that R is not in 2NF is not because of the status of
> D but because of the status of A -- A on its own is not a candidate key,
> and the fact that A alone determines D is what prevents R from being in
> 2NF.
>
> Yes. (And this is a change in stance from yesterday's post.)
>
>
> By implication, yesterday I said "but R is in 2NF"; today, I think that
> was wrong.
>
>
>
> Yes. To understand this, we need to establish that AB, BC and CD are
> candidate keys - which is discussed below. One of the strengths of BCNF
> over 3NF is its greater simplicity.
>
> Date summarizes this nicely (An Introduction to Database Systems, 7th
> Edn, p375):
>
> 3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
> the attributes of R, and let A be any single attribute of R. Then R is
> in 3NF if and only if, for every FD X->A in R, at least one of the
> following is true:
> 1. X contains A (so the FD is trivial);
> 2. X is a superkey;
> 3. A is contained in a candidate key of R.
>
> BCNF is obtained from this definition of 3NF by dropping condition 3.
> (And I note that superkeys make an appearance here.)
>
> This does not depend on the definition of 2NF; some other
> characterizations of 3NF have an "if the relvar [relation] is in 2NF and
> ..." structure.
>
> Since every single attribute on the RHS of an FD in your set satisfies
> requirement 3, R must be in 3NF. It is surely not in BCNF, though.
>
> [
> Somewhat tangentially - at p400, Date gives the following neat
> characterization of BCNF, 4NF and 5NF:
>
> BCNF: A relvar is in BCNF if and only if every FD satisfied by R
> is implied by the candidate keys of R.
>
> 4NF: A relvar is in 4NF if and only if every MVD satisfied by R
> is implied by the candidate keys of R.
>
> 5NF: A relvar is in 5NF if and only if every JD satisfied by R
> is implied by the candidate keys of R.
> ]
>
>
> OK; I sit corrected - twice.
>
> So, all the attributes are prime attributes, which means that the
> relation is in 2NF by either definition (or, at least, by either
> definition as amended) because both definitions only refer to non-prime
> attributes and your relation schema has no non-prime attributes.
>
>
> I still think the difference between the two definitions is mainly in
> that the first implicitly assumes that there is just one candidate key
> and the second implicitly permits there to be multiple candidate keys,
> and therefore the second definition is more general and better.
Date: Fri, 22 Jul 2005 08:11:06 GMT
Message-ID: <42E0AA14.5000409_at_earthlink.net>
Jonathan Leffler wrote:
> I've included most of Jens original question and rather more of my
> initial response to Jens question in this response, so that the
> discussion is more nearly complete in one message.
Darn it - I review and review and still the obvious errors escape!
>> Jonathan Leffler wrote: >>> Jens Haase wrote: >>>> I found two definitions for 2NF: >>>> >>>> 1: A relation R(A,F) is in 2NF, when every attribute not >>>> belonging to the primary key of R is fully functionally dependent >>>> on the primary key of R >>> >>> >>> Can you give a hint as to the significance of the parenthesized >>> "(A,F)"; neither A nor F is referenced in the definition quoted. I >>> don't think it alters the answer, but I'm curious. (The comma after >>> 2NF should be omitted too.)
>
> A beter version of this definition would be "A relation R is in 2NF when
> every attribute that is not part of a candidate key of R is fully
> functionally dependent on the candidate key(s) of R".
>
>> R(A,F) means a Relation R with attributes A and functional >> dependencies F
>
> Fine - as expected, this makes no difference to the answer.
>
>>>> 2: A relation schema R is in 2NF if every nonprime attribute A in >>>> R is fully functionally dependent on the primary key of R >>> >>> Assuming 'nonprime attributes' are the attributes that do not >>> belong to the primary key, it seems to me that the two definitions >>> are the same. >> >> This definition (the second) is from Elmasri/Navate Fundamentals of >> Database Systems Second Edition Page 411. >> On Page 408 in the same book the authors state: >> >> An attribute of relation schema R is called a prime attribute of R if >> it is a member of some candidate key of R. An attribute is called >> nonprime if it is not a prime attribute—that is, if it is not a >> member of any candidate key.
>
> It certainly helps to know the extra background. As I noted in a
> separate response to Jon Heggland's comment, neither of the definitions
> clearly allows for multiple candidate keys, though given this definition
> of non-prime attribute, you can argue that this version does. However,
> even so, it would be IMNSHO be better phrased if it finished 'dependent
> on each candidate key of R'.
One of the 'be's on either side of "IMNSHO" is, IMNSHO, redundant.
>> A superkey [...]
>
> Superkeys are not relevant to most of this discussion, I think.
>
>> If a relation schema has more than one key, each is called a >> candidate key. One of the candidate keys is arbitrarily designated to >> be the primary key, and the others are called secondary keys
>
>
> Secondary key is not a common term for the non-primary candidate keys
> (though it is more logical in some respects than 'alternative key',
> which is the term I'm more familiar with). These days there is no
> formal reason to distinguish one candidate key as the primary key and
> the others by another name. Normalization theory is best defined in
> terms of candidate keys (see Date 'The Primacy of Primary Keys: An
> Investigation' in 'Relational Database: Writings 1991-1994').
>
>> According to this definition there is a difference between the two >> definitions.
>
> Just about - and the second is the better definition, but not perfect.
>
>>>> According to the first definition the following schema would not >>>> be in 2NF, according to the second it would be: >>>> >>>> R = (A,B,C,D) >>>> >>>> with: >>>> A->D >>>> AB->C >>>> C->D >>>> D->A >>>> >>>> According to the first definition when the primary key is AB, then >>>> A->D violates 2NF, because D is not part of the primary key. >>> >>> I agree that AB is the primary key - or at least a valid primary key.
>
> AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
> (union); and AB->ABCD (reflexivity); and therefore AB determines all
> attributes of R, which makes it a candidate key.
>
>>> A->D itself cannot violate 2NF; it is a functional dependency, and >>> relation schemas are what satisfy or violate 2NF. And the reasoning >>> the that D is not part of the primary key is bogus too.
>
> Clearly, 'the that' is a typo - sorry. The first sentence is logic
> chopping, and may not be fair since I suspect English is not Jens's
> first language; nevertheless, it is also accurate. "According ... AB,
> then the FD A->D means that R is not in 2NF" would be concise - but
> wrong. However, the trouble is that the first definition precludes the
> possibility of multiple candidate keys, and therefore cannot be applied
> to this relation schema where, as shown below, there are 3 candidate
> keys. If the first definition is adapted to fix the problem, it becomes
> equivalent to the second, and the relation R is in 2NF.
>
> I don't think my second sentence makes as much sense as I'd like it to.
> However, the reason that R is not in 2NF is not because of the status of
> D but because of the status of A -- A on its own is not a candidate key,
> and the fact that A alone determines D is what prevents R from being in
> 2NF.
My 'However' sentence is just wrong. R is in 2NF (as stated at the end of the preceding paragraph). You can tell I did some circular editing.
>> A->D tells us that D is only dependant on A and so only on a part of >> the primary key.
>
> Yes. (And this is a change in stance from yesterday's post.)
>
>>> As I see it, AB->C and C->D implies that AB->D so the primary key >>> determines both C and D, as required. Granted, it is neither in 3NF >>> nor BCNF - the mnemonic is "the key, the whole key, and nothing but >>> the key", and the functional dependency A->D means that 'the whole >>> key' part of the rule is violated. (The C->D part is also >>> problematic, but doesn't cause the relation schema to violate 2NF.)
>
> By implication, yesterday I said "but R is in 2NF"; today, I think that
> was wrong.
>
>> According to Ullmans definition of 3NF: >> A relation R is in third normal form in whenever X -> A holds in R >> and A is not in X, then either X is a candidate key for R, or A is >> [a] prime attribute >> >> it is in 3NF since: >> >> A->D holds 3NF since D is prime attribute >> AB->C holds 3NF because AB is candidate key >> C->D holds 3NF because D is prime attribute >> D->A holds 3NF because A is prime attribute
>
>
> Yes. To understand this, we need to establish that AB, BC and CD are
> candidate keys - which is discussed below. One of the strengths of BCNF
> over 3NF is its greater simplicity.
>
> Date summarizes this nicely (An Introduction to Database Systems, 7th
> Edn, p375):
>
> 3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
> the attributes of R, and let A be any single attribute of R. Then R is
> in 3NF if and only if, for every FD X->A in R, at least one of the
> following is true:
> 1. X contains A (so the FD is trivial);
> 2. X is a superkey;
> 3. A is contained in a candidate key of R.
>
> BCNF is obtained from this definition of 3NF by dropping condition 3.
> (And I note that superkeys make an appearance here.)
>
> This does not depend on the definition of 2NF; some other
> characterizations of 3NF have an "if the relvar [relation] is in 2NF and
> ..." structure.
>
> Since every single attribute on the RHS of an FD in your set satisfies
> requirement 3, R must be in 3NF. It is surely not in BCNF, though.
>
> [
> Somewhat tangentially - at p400, Date gives the following neat
> characterization of BCNF, 4NF and 5NF:
>
> BCNF: A relvar is in BCNF if and only if every FD satisfied by R
> is implied by the candidate keys of R.
>
> 4NF: A relvar is in 4NF if and only if every MVD satisfied by R
> is implied by the candidate keys of R.
>
> 5NF: A relvar is in 5NF if and only if every JD satisfied by R
> is implied by the candidate keys of R.
> ]
>
>>>> According to the second definition R is in 2NF because the
>>>> candidate keys are AB, BC, BD and since that, every attribute is
>>>> prime attribute, so there is no violation of 2NF.
>>>
>>>
>>> Now you introduce a term - candidate key - not mentioned in the
>>> definitions of 2NF that we're dissecting. I would dispute that BC
>>> is a candidate key anyway; likewise BD. Given a relation schema R {
>>> A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
>>> by the rules of trivial functional dependencies (or FDs),
>>> AB->ABCDE). That is, a candidate key functionally determines every
>>> attribute in the relation schema. How do you apply Armstrong's
>>> Axioms to the relation schema and list of FDs to deduce that
>>> BC->AD?
>>
>> Well since C->D and D->A with Armstrongs 3. Axiom we get C->A that
>> gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC is
>> a candidate key.
>>
>> since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
>> Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a
>> candidate key.
>
> OK; I sit corrected - twice.
>
> So, all the attributes are prime attributes, which means that the
> relation is in 2NF by either definition (or, at least, by either
> definition as amended) because both definitions only refer to non-prime
> attributes and your relation schema has no non-prime attributes.
>
>> The [w]hole problem i am facing, is that the first definition is from >> a german script and the second is from an english book >> (Elmasri/Navathe) and my thoug[h]t is, that there is a confusion >> between "Primary key" and "prime".
>
> I still think the difference between the two definitions is mainly in
> that the first implicitly assumes that there is just one candidate key
> and the second implicitly permits there to be multiple candidate keys,
> and therefore the second definition is more general and better.
-- Jonathan Leffler #include <disclaimer.h> Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/Received on Fri Jul 22 2005 - 10:11:06 CEST
