# Re: Constraints and Functional Dependencies

Date: Sat, 24 Feb 2007 15:55:47 GMT

Message-ID: <7eZDh.684$PV3.10430_at_ursa-nb00s0.nbnet.nb.ca>

Marshall wrote:

> I hope no one will mind too much if I take a break from our

*> usual sort of discussion and talk about database theory for
**> a bit. I feel like free associating about functional dependencies.
**>
**> I like to consider a system under which there is only one type
**> of constraint: a formula of first-order logic. That is to say,
**> there are no uniqueness constraints, foreign keys, primary keys,
**> etc., except as such can be expressed as FOL formulas.
**>
**> In a data management context, there is some value to restricting
**> what we can quantify over as being only attributes of declared
**> relations (whether variables or constants.) So, we can't express
**> the "no-upper-bound" property of the natural numbers; they aren't
**> a database table so we can't quantify over them.
**>
**> With such a system, a relation R with attribute a (which I will
**> write as R(a)) having a as a foreign key into S(b) is expressed
**> as follows:
**>
**> forall R(a): exists S(b): a = b
**>
**> So we can express foreign keys this way.
**>
**> In the context of relations with named attributes, it is not necessary
**> to declare or make up logic variable names, the way we have to when we
**> are using sets of ordered pairs. We can use the existing attribute
**> names as the names of the logic variables. However that raises the
**> question of what happens when we want to quantify over an attribute
**> more than once in a given formula. In that case let a primed attribute
**> name be considered a second instance of a logic variable quantified
**> over the attribute. So if we want to say that for every row of R
**> with attribute a there exists a row of R with an unequal value
**> for attribute a, we can say:
**>
**> forall R(a): exists R(a'): a != a'
**>
**> The way quantification is usually expressed we can only introduce
**> one quantified variable at a time, however it makes sense to relax
**> this syntactic restriction when dealing with relations with
**> named attributes. We can introduce quantified variables matching
**> many attributes at once. However we have to be aware that when
**> we do so, they are all quantified the same way, and they are
**> all considered to come from the same row.
**>
**> What about candidate keys? Suppose we have a relation R with
**> *sets* of attributes A and B: R{A, B}. Again, A is a set of
**> attributes, possibly just one or even none; likewise B. Then
**> if we can express the functional dependency:
**>
**> A -> B
**>
**> that is the same as expressing that A is a candidate key.
**> How do we express functional dependencies as FOL formulas?
**> Let's start with a simple case, where A and B have one
**> attribute each, a and b.
**>
**> R(a, b)
**> {a} -> {b} =def=
**> forall R(a, b): forall R(a', b'): a = a' => b = b'
**>
**> " For all rows (a,b) in R, and for all rows (a', b') in R,
**> if a = a', then b = b' "
**>
**> How does that look if R has many columns?
**>
**> R{A, B}
**> A = {a1, ... an}
**> B = {b1, ... bn}
**>
**> A -> B =def=
**> forall R(a1, ... an, b1, ... bn)
**> forall R(a1', ... an', b1', ... bn')
**> (a1 = a1' and ... and an = an') =>
**> (b1 = b1' and ... and bn = bn')
**>
**> Ugh, I wish that were simpler. But conceptually it's not too
**> complex. Starting from the earlier a = a' => b = b' case,
**> we have simply expanded the a = a' on the left to a big
**> conjunction of a = a' clauses for each attribute in A, and
**> done the same thing on the right for the attributes in B.
**>
**> It's always interesting and enlightening to consider the zero-case.
**> Let's look at the B = {} case, and see how it turns out.
**>
**> R{A, B}
**> A = {a1, ... an}
**> B = {}
**>
**> A -> B =
**> forall R(a1, ... an)
**> forall R(a1', ... an')
**> (a1 = a1' and ... and an = an') => ()
**>
**> Hmm, what's that () doing? It's an empty conjunction, so we can
**> replace it with true, which is the identity for and. But what does
**> "implies true" really mean? Let's transform it.
**>
**> We know that (P => Q) <=> (!P or Q), so
**>
**> forall R(a1, ... an)
**> forall R(a1', ... an')
**> !(a1 = a1' and ... and an = an') or true
**>
**> So we have a condition that ends with "or true" which is to say
**> it is always true! In other words, the constraint collapses away
**> and we are left with nothing. Which, when you think about it, is
**> exactly what you would want: if A -> B, and B is empty, that just
**> means that A is unique, and that's true by the definition of
**> relation. So there shouldn't be anything to check.
**>
**> What if A = {}?
**>
**> R{A, B}
**> A = {}
**> B = {b1, ... bn}
**>
**> A -> B =
**> forall R(b1, ... bn)
**> forall R(b1', ... bn')
**> () => (b1 = b1' and ... and bn = bn')
**>
**> transforming that last line:
**>
**> !true or (b1 = b1' and ... and bn = bn')
**>
**> simplifying:
**>
**> b1 = b1' and ... and bn = bn'
**>
**> What does that mean? For every two rows of R, every attribute
**> is the same. That is to say, R has at most one row. Which again
**> is exactly what you would expect from a uniqueness constraint
**> on zero attributes.
**>
**> Questions, comments, corrections, criticisms? Anyone have any
**> alternate formulas for functional dependencies or candidate
**> keys?
*

Those look good to me. Domain constraints are very easy:

forall R(a1,...an): a1 in d1 and ... an in dn

I don't know why you think one cannot express unboundedness--not that the constraint is meaningful for finite computers.

forall R(a): exists R(a'): a' = a + 1

or

forall R(a): exists R(a'): a' > a Received on Sat Feb 24 2007 - 16:55:47 CET