algebra equations for Reference and FD constraints

From: paul c <toledobythesea_at_oohay.ac>
Date: Tue, 23 Dec 2008 16:09:06 -0800
Message-ID: <z3f4l.1644$mE3.1114_at_newsfe14.iad>


Cimode wrote:
...
>>> 1. Database constraints are equations, and this generalization is a
>>> natural way to encompass them.
>> That's an interesting take. I'm assuming that these equations can be
>> expressed in the algebra. Supposing that you have relation schemata R{A, B,
>> C} and S{A, D}. How would you express an interrelational constraint, such as
>> the inclusion dependency,
>>
>> R[A] IN S[A]
> Date has clarified this aspect as subtyping. > ...

These question and the one below rear up every so often. Because I think they are important examples of constraints, I'm replying with a new subject line. (Along with my very best wishes of the season to those who think newsgroup messages are inherently hierarchical, as well as to all other mystics, as well as the small remaining minority!)

I don't know that Date has compared an inclusion dependency to subtyping but regardless, in the A-algebra, it is pretty simple to define an inclusion dependency (although I think of it as a Reference constraint, a more general case of a foreign key constraint).

If a dbms produces a result R' that is required to obey the constraint, the A-algebra can express the constraint as R'{A} = R'{A} <AND> S{A}. There will be no value for A in any R tuple that is not an A value in some S tuple. This equation indicates what the dbms must do: apply its equivalent of the <AND> operator to the projections of the potential result R' and of S and verify that the result equals the projection R'{A}. Note that as for pretty much relation constraints, the application doesn't have to be verbatim, many optimizations are possible.

Note that whether the dbms takes some exceptional action if the constraint isn't satisfied or merely eliminates offending tuples in the result is not something that the algebra dictates. By taking an algebraic difference it is possible for the dbms to also identify offending tuples.

(That wasn't strict A-algebra syntax, since I used braces for projection instead of the <REMOVE> operator, but this is only a syntactic difference.)

>> as an equation using the algebra? Or for that matter, how would you express
>> the functional dependency,
>>
>> AB --> C
>>
>> as an equation using the algebra?

> Truth tables are easy to set up to validate FD in ra and FOPC.
> Validating each fact can be easily formalized in ra.
> ...

I believe FD's can also be expressed algebraically, provided we use a slightly different definition of the A-algebra <RENAME> operation, or with some equally effective change to another algebra. Following is a demonstration of how the algebra could express the constraint and therefore allow a dbms to show behaviour that is formally defined. If this is more long-winded than necessary I hope somebody will shorten it.

The reason for adjusting <RENAME> has to do with the inability of A-algebra syntax to form a relation R'{a,a1} from R{a}, such that each tuple has two triples with the same 'v' value.

One way to get that ability is to replace the <RENAME> operator with, say, a <DUP> operator whose definition varies like so - the line:

Hs = ( Hr minus { <A,T> } ) union { <B,T> } becomes

Hs =   Hr                   union { <B,T> }

and the two lines:

ts = ( tr minus { <A,T,v> } )

           union { <B,T,v> } ) } become

ts = ( tr union { <B,T,v> } ) }

(This might be 'more primitive' than <REMOVE> since it doesn't 'project' via 'minus' and <REMOVE> could then become a syntactic 'macro', ie. shorthand, but those possibilities don't matter for this answer, so I'll keep the <RENAME> definition along with the additional <DUP> definition.)

The same basic approach can be used for the FD A -> C as for AB -> C, so for now I'll stick with the former for easier readability.

Say R=
A C
1 1
2 1
2 1

The 'first' 'tuple' obeys the constraint but the 'second' two don't. Form an equivalent of the Cartesian Product of R like so:

RD = ((R <DUP> (A,A1)) <DUP> (C, C1)){A1,C1) RC = R <AND> RD

RD has the extension value:

A1 C1
1 1
2 1
2 1

RC has the extension value:

     A C A1 C1

t1: 1  1  1  1
t2: 1  1  2  1
t3: 1  1  2  2
t4: 2  1  1  1
t5: 2  1  2  1
t6: 2  1  2  2
t7: 2  2  1  1
t8: 2  2  2  1
t9: 2  2  2  2

(Here I've labeled the tuples for reference.)

Except for the renaming of attributes, every possible pair of original tuples now effectively forms a single RC tuple. RC tuples where A = A1 and C not equal to C1 signify possible contradictions of the constraint.   However, there is a problem with what follows if we don't eliminate the tuples t3, t4 and t7. They are spurious and don't contribute to the identification of tuples that contradict the constraint nor ones that obey the constraint, so we could ignore them. Tuple t2 is also spurious because A <> A1 but it doesn't affect the identification of constraint contradictions because C <> C1 in t2. Note that eliminating tuples t2,t3,t4 and t7 is the same as including only the tuples where A = A1.

Neither the A-algebra nor this modified version offer any syntax for identifying equal tuples or triples. If we are to use this algebra for this purpose we need to operate it using only the equality that is inferable, namely equality of relation equations and extension values.

Now form RD2 = R{A} <DUP> (A, A1).
RD2 has the extension value:

     A C

i1: 1  1
i2: 2  1
i3: 2  2

Tuples i1 and i3 contain values such that A = A1. Tuple i2 doesn't.

Let me introduce a shorthand for exclusive OR, ie., <XOR>:

X <XOR> Y = ((<NOT> X) <AND> Y) <OR> (X <AND> (<NOT> Y))

Equality of X and Y can be expressed as <NOT> (X <XOR> Y), as a truth table will show. Let RD3 = RD2 <AND> (<NOT> (RD2{A} <XOR> RD2{A1})). RD3 has the extension value:

     A A1

i1: 1  1
i3: 2  2
i4: 3  3
i5: 4  4

... et cetera, limited only by the range of the type values of A and A1.

Similarly, with the <DUP> operations we can also produce RD4 that has the extension value:

     C C1

i1: 1  1
i3: 2  2
i4: 3  3
i5: 4  4

... et cetera, as above.

Now eliminate spurious tuples from RC, giving RX = RC <AND> RD3. RX has the extension value:

     A C A1 C1

t1: 1  1  1  1
t5: 2  1  2  1
t6: 2  1  2  2
t8: 2  2  2  1
t9: 2  2  2  2

Now produce RXC = RX <AND> (<NOT> RD4). RXC has the extension value:

     A C A1 C1
t6: 2 1 2 2
t8: 2 2 2 1

RXC has values in its {A,C} projection for tuples that contradict the constraint. So, a dbms could apply some equivalent of the above equations to determine such tuples or with other equations, determine their relative complement, tuples that agree with the constraint.

I suspect that if these equations work for this example, they will work for all cases. Although they are long and tedious to follow, I think they do have the advantage of relying only on algebraic rules for symbolic manipulation, eg., they don't bring "COUNT" into the 'equation', as it were, and the manipulations are eminently suitable for digital computers to effect.

Perhaps somebody can come up with a proof. In the meantime, I hope this will help dispel the claim that there are some 'model' concepts that can't be expressed with an algebra or calculus. Those claims seem always to involve notions that the algebra has nothing to do with, so I see them as being like the claim that the RM is not good enough in the real world because SQL has problems in the real world. Received on Wed Dec 24 2008 - 01:09:06 CET

Original text of this message