Constraints and Functional Dependencies

From: Marshall <marshall.spight_at_gmail.com>
Date: 24 Feb 2007 00:01:27 -0800
Message-ID: <1172304086.527354.319940_at_j27g2000cwj.googlegroups.com>



I hope no one will mind too much if I take a break from our usual sort of discussion and talk about database theory for a bit. I feel like free associating about functional dependencies.

I like to consider a system under which there is only one type of constraint: a formula of first-order logic. That is to say, there are no uniqueness constraints, foreign keys, primary keys, etc., except as such can be expressed as FOL formulas.

In a data management context, there is some value to restricting what we can quantify over as being only attributes of declared relations (whether variables or constants.) So, we can't express the "no-upper-bound" property of the natural numbers; they aren't a database table so we can't quantify over them.

With such a system, a relation R with attribute a (which I will write as R(a)) having a as a foreign key into S(b) is expressed as follows:

  forall R(a): exists S(b): a = b

So we can express foreign keys this way.

In the context of relations with named attributes, it is not necessary to declare or make up logic variable names, the way we have to when we are using sets of ordered pairs. We can use the existing attribute names as the names of the logic variables. However that raises the question of what happens when we want to quantify over an attribute more than once in a given formula. In that case let a primed attribute name be considered a second instance of a logic variable quantified over the attribute. So if we want to say that for every row of R with attribute a there exists a row of R with an unequal value for attribute a, we can say:

  forall R(a): exists R(a'): a != a'

The way quantification is usually expressed we can only introduce one quantified variable at a time, however it makes sense to relax this syntactic restriction when dealing with relations with named attributes. We can introduce quantified variables matching many attributes at once. However we have to be aware that when we do so, they are all quantified the same way, and they are all considered to come from the same row.

What about candidate keys? Suppose we have a relation R with *sets* of attributes A and B: R{A, B}. Again, A is a set of attributes, possibly just one or even none; likewise B. Then if we can express the functional dependency:

   A -> B

that is the same as expressing that A is a candidate key. How do we express functional dependencies as FOL formulas? Let's start with a simple case, where A and B have one attribute each, a and b.

  R(a, b)
  {a} -> {b} =def=
    forall R(a, b): forall R(a', b'): a = a' => b = b'

" For all rows (a,b) in R, and for all rows (a', b') in R,   if a = a', then b = b' "

How does that look if R has many columns?

  R{A, B}
  A = {a1, ... an}
  B = {b1, ... bn}

  A -> B =def=
    forall R(a1, ... an, b1, ... bn)

      forall R(a1', ... an', b1', ... bn')
        (a1 = a1' and ... and an = an') =>
        (b1 = b1' and ... and bn = bn')

Ugh, I wish that were simpler. But conceptually it's not too complex. Starting from the earlier a = a' => b = b' case, we have simply expanded the a = a' on the left to a big conjunction of a = a' clauses for each attribute in A, and done the same thing on the right for the attributes in B.

It's always interesting and enlightening to consider the zero-case. Let's look at the B = {} case, and see how it turns out.

  R{A, B}
  A = {a1, ... an}
  B = {}

  A -> B =
    forall R(a1, ... an)

      forall R(a1', ... an')
        (a1 = a1' and ... and an = an') => ()

Hmm, what's that () doing? It's an empty conjunction, so we can replace it with true, which is the identity for and. But what does "implies true" really mean? Let's transform it.

We know that (P => Q) <=> (!P or Q), so

    forall R(a1, ... an)

      forall R(a1', ... an')
        !(a1 = a1' and ... and an = an') or true

So we have a condition that ends with "or true" which is to say it is always true! In other words, the constraint collapses away and we are left with nothing. Which, when you think about it, is exactly what you would want: if A -> B, and B is empty, that just means that A is unique, and that's true by the definition of relation. So there shouldn't be anything to check.

What if A = {}?

  R{A, B}
  A = {}
  B = {b1, ... bn}

  A -> B =
    forall R(b1, ... bn)

      forall R(b1', ... bn')

() => (b1 = b1' and ... and bn = bn')

transforming that last line:

        !true or (b1 = b1' and ... and bn = bn')

simplifying:

        b1 = b1' and ... and bn = bn'

What does that mean? For every two rows of R, every attribute is the same. That is to say, R has at most one row. Which again is exactly what you would expect from a uniqueness constraint on zero attributes.

Questions, comments, corrections, criticisms? Anyone have any alternate formulas for functional dependencies or candidate keys?

Marshall Received on Sat Feb 24 2007 - 09:01:27 CET

Original text of this message