Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!drn.maxwell.syr.edu!news.maxwell.syr.edu!syros.belnet.be!news.belnet.be!hq-usenetpeers.eweka.nl!68.142.88.75.MISMATCH!hwmnpeer01.ams!nntp.telenet.be!kwabbernoot.telenet-ops.be!phobos.telenet-ops.be.POSTED!not-for-mail
From: Jan Hidders
User-Agent: Mozilla Thunderbird 0.8 (X11/20040923)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.databases.theory
Subject: Re: Why spurious tuples with fifth normal form?
References: <1118855681.264353.202550@g43g2000cwa.googlegroups.com> <1118866450.991816.312800@g43g2000cwa.googlegroups.com> <1118873990.646675.91420@g43g2000cwa.googlegroups.com> <1118893392.756491.79550@g49g2000cwa.googlegroups.com> <1118935050.334601.112290@o13g2000cwo.googlegroups.com> <1118950777.376315.19390@g49g2000cwa.googlegroups.com>
In-Reply-To: <1118950777.376315.19390@g49g2000cwa.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 81
Message-ID: <4ZDse.123517$RR6.7007334@phobos.telenet-ops.be>
Date: Fri, 17 Jun 2005 17:43:28 GMT
NNTP-Posting-Host: 81.165.226.147
X-Complaints-To: abuse@telenet.be
X-Trace: phobos.telenet-ops.be 1119030208 81.165.226.147 (Fri, 17 Jun 2005 19:43:28 MEST)
NNTP-Posting-Date: Fri, 17 Jun 2005 19:43:28 MEST
Organization: Telenet Internet
Xref: dp-news.maxwell.syr.edu comp.databases.theory:31578
stowellt@gmail.com wrote:
> Please correct me if I'm wrong, but isn't a join
> dependency when you are able to join decomposed tables on common keys
> to get the original table?
Almost. The common columns on which you join don't have to be keys.
> In my case, I thought I had found a join
> dependency because most of the time, if I joined the decomposed tables
> together I got the original back. But every so often a spurious tuple
> showed up. Would this be an example of checking if a JD held?
Yes. That is precisely what it is.
> (and it
> didn't hold when the spurious tuple showed up)? Is the only way to
> check this by trial-and-error?
In practice, yes, but if you have very good analytical and mathematical
skills then it is also possible to recognize the constraints that imply
this. They usually have the form that I already mentioned; loosely
speaking something like "if certain separate combinations of subsets of
columns appear in the relation then the full combination also must appear".
> So if there would have been no spurious results, then I assume the JD
> would have held and thus I should have decomposed.
No, not always, only if the JD that you found is not implied by the
candidate keys. How can you check this? You use the following very
simple algorithm:
Input:
- A set of candidate keys { K_1, ..., K_n }
- A join dependency JD { C_1, ..., C_m }
where C_1, .., C_m are sets of column names
Steps:
1. Let H be the set { C_1, ..., C_m }
2. If for two different sets C_i and C_j in H the intersection of C_i
and C_j contains the candidate key K_p, then replace C_i and C_j in H
with the union of C_i and C_j
3. Repeat step 2 until there are no more two different C_i and C_j with
a common candidate key
4. If one of the sets in the resulting H contains all the column names
of the relation then the answer is yes, otherwise no.
To give a small example: let for the relation R(A,B,C,D,E) the candidate
keys be AB and C, and assume that the JD { ABC, ABE, CE } holds. Is it
implied?:
step 1. H = { ABC, ABD, CE }
step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB)
step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C)
step 3. no more two distinct components with common cand. key
step 4. The JD is implied by the cand. keys because ABCDE is in H
So even though this join dependency holds, you do not have to split.
Exercise for the reader: explain why trivial JDs (i.e. JDs where at
least one component contains all column names) are according to the
algorithm above implied by any set of candidate keys, including the
empty set.
> I'm just not sure
> why my data set didn't have the JD when I have seen similar sets of
> data (with three inter-dependant columns) that did have JD's.
Hard to say. In my experience JDs that are not simply FDs or MVDs are
pretty rare, and JDs that are not implied by the candidate keys even
more so. So my guess would be that actually what you saw where FDs or
MVDs but weren't recognized as such. But that's just a guess.
> Are all JD's cyclic?
No. If you really want me to, I could explain to you what it means (it
refers to the acyclicity of the hypergraph being described by the JD)
but actually it is irrelevant for the decision of splitting or not. For
that you only have to know if the JD is implied by the candidate keys,
as I explained above.
-- Jan Hidders