# Re: Why spurious tuples with fifth normal form?

Date: Fri, 17 Jun 2005 17:43:28 GMT

Message-ID: <4ZDse.123517$RR6.7007334_at_phobos.telenet-ops.be>

stowellt_at_gmail.com wrote:

> Please correct me if I'm wrong, but isn't a join

*> dependency when you are able to join decomposed tables on common keys
**> to get the original table?
*

Almost. The common columns on which you join don't have to be keys.

> In my case, I thought I had found a join

*> dependency because most of the time, if I joined the decomposed tables
**> together I got the original back. But every so often a spurious tuple
**> showed up. Would this be an example of checking if a JD held?
*

Yes. That is precisely what it is.

> (and it

*> didn't hold when the spurious tuple showed up)? Is the only way to
**> check this by trial-and-error?
*

In practice, yes, but if you have very good analytical and mathematical skills then it is also possible to recognize the constraints that imply this. They usually have the form that I already mentioned; loosely speaking something like "if certain separate combinations of subsets of columns appear in the relation then the full combination also must appear".

> So if there would have been no spurious results, then I assume the JD

*> would have held and thus I should have decomposed.
*

No, not always, only if the JD that you found is not implied by the candidate keys. How can you check this? You use the following very simple algorithm:

Input:

- A set of candidate keys { K_1, ..., K_n }
- A join dependency JD { C_1, ..., C_m } where C_1, .., C_m are sets of column names

Steps:

1. Let H be the set { C_1, ..., C_m }

2. If for two different sets C_i and C_j in H the intersection of C_i
and C_j contains the candidate key K_p, then replace C_i and C_j in H
with the union of C_i and C_j

3. Repeat step 2 until there are no more two different C_i and C_j with
a common candidate key

4. If one of the sets in the resulting H contains all the column names
of the relation then the answer is yes, otherwise no.

step 1. H = { ABC, ABD, CE } step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB) step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C) step 3. no more two distinct components with common cand. key step 4. The JD is implied by the cand. keys because ABCDE is in H

So even though this join dependency holds, you do not have to split.

Exercise for the reader: explain why trivial JDs (i.e. JDs where at least one component contains all column names) are according to the algorithm above implied by any set of candidate keys, including the empty set.

> I'm just not sure

*> why my data set didn't have the JD when I have seen similar sets of
**> data (with three inter-dependant columns) that did have JD's.
*

Hard to say. In my experience JDs that are not simply FDs or MVDs are pretty rare, and JDs that are not implied by the candidate keys even more so. So my guess would be that actually what you saw where FDs or MVDs but weren't recognized as such. But that's just a guess.

No. If you really want me to, I could explain to you what it means (it refers to the acyclicity of the hypergraph being described by the JD) but actually it is irrelevant for the decision of splitting or not. For that you only have to know if the JD is implied by the candidate keys, as I explained above.

- Jan Hidders