# Re: algebra equations for Reference and FD constraints

From: Brian Selzer <brian_at_selzer-software.com>
Date: Wed, 24 Dec 2008 06:48:30 -0500
Message-ID: <jkp4l.11406\$c45.412_at_nlpi065.nbdc.sbc.com>

"paul c" <toledobythesea_at_oohay.ac> wrote in message news:z3f4l.1644\$mE3.1114_at_newsfe14.iad...
> Cimode wrote:
> ...

```>>>> 1. Database constraints are equations, and this generalization is a
>>>> natural way to encompass them.
>>> That's an interesting take.  I'm assuming that these equations can be
>>> expressed in the algebra.  Supposing that you have relation schemata
>>> R{A, B,
>>> C} and S{A, D}. How would you express an interrelational constraint,
>>> such as
>>> the inclusion dependency,
>>>
>>> R[A] IN S[A]
>> Date has clarified this aspect as subtyping.
>> ...
>
```

> These question and the one below rear up every so often. Because I think
> they are important examples of constraints, I'm replying with a new
> subject line. (Along with my very best wishes of the season to those who
> think newsgroup messages are inherently hierarchical, as well as to all
> other mystics, as well as the small remaining minority!)
```>
```

> I don't know that Date has compared an inclusion dependency to subtyping
> but regardless, in the A-algebra, it is pretty simple to define an
> inclusion dependency (although I think of it as a Reference constraint, a
> more general case of a foreign key constraint).
```>
```

> If a dbms produces a result R' that is required to obey the constraint,
> the A-algebra can express the constraint as R'{A} = R'{A} <AND> S{A}.
> There will be no value for A in any R tuple that is not an A value in some
> S tuple. This equation indicates what the dbms must do: apply its
> equivalent of the <AND> operator to the projections of the potential
> result R' and of S and verify that the result equals the projection R'{A}.
> Note that as for pretty much relation constraints, the application doesn't
> have to be verbatim, many optimizations are possible.
```>
```

> Note that whether the dbms takes some exceptional action if the constraint
> isn't satisfied or merely eliminates offending tuples in the result is not
> something that the algebra dictates. By taking an algebraic difference it
> is possible for the dbms to also identify offending tuples.
```>
```

> (That wasn't strict A-algebra syntax, since I used braces for projection
> instead of the <REMOVE> operator, but this is only a syntactic
> difference.)
```>
>>> as an equation using the algebra?  Or for that matter, how would you
>>> express
>>> the functional dependency,
>>>
>>> AB --> C
>>>
>>> as an equation using the algebra?
>> Truth tables are easy to set up to validate FD in ra and FOPC.
>> Validating each fact can be easily formalized in ra.
>> ...
>
```

> I believe FD's can also be expressed algebraically, provided we use a
> slightly different definition of the A-algebra <RENAME> operation, or with
> some equally effective change to another algebra. Following is a
> demonstration of how the algebra could express the constraint and
> therefore allow a dbms to show behaviour that is formally defined. If this
> is more long-winded than necessary I hope somebody will shorten it.
```>
```

> The reason for adjusting <RENAME> has to do with the inability of
> A-algebra syntax to form a relation R'{a,a1} from R{a}, such that each
> tuple has two triples with the same 'v' value.
```>
```

> One way to get that ability is to replace the <RENAME> operator with, say,
> a <DUP> operator whose definition varies like so - the line:
```>

> Hs = ( Hr minus { <A,T> } ) union { <B,T> } becomes

> Hs =   Hr                   union { <B,T> }
>
```

> and the two lines:
```>

> ts = ( tr minus { <A,T,v> } )

>           union { <B,T,v> } ) } become
>
```

> ts = ( tr union { <B,T,v> } ) }
```>
```

> (This might be 'more primitive' than <REMOVE> since it doesn't 'project'
> via 'minus' and <REMOVE> could then become a syntactic 'macro', ie.
> shorthand, but those possibilities don't matter for this answer, so I'll
> keep the <RENAME> definition along with the additional <DUP> definition.)
```>
```

> The same basic approach can be used for the FD A -> C as for AB -> C, so
> for now I'll stick with the former for easier readability.
```>
```

> Say R=
> A C
> 1 1
> 2 1
> 2 1
```>
```

> The 'first' 'tuple' obeys the constraint but the 'second' two don't. Form
> an equivalent of the Cartesian Product of R like so:
```>
```

> RD = ((R <DUP> (A,A1)) <DUP> (C, C1)){A1,C1)
> RC = R <AND> RD
```>
```

> RD has the extension value:
```>
```

> A1 C1
> 1 1
> 2 1
> 2 1
```>

> RC has the extension value:
>     A  C A1 C1
> t1: 1  1  1  1
> t2: 1  1  2  1

> t3: 1  1  2  2
> t4: 2  1  1  1
> t5: 2  1  2  1
> t6: 2  1  2  2
> t7: 2  2  1  1
```
> t8: 2 2 2 1
> t9: 2 2 2 2
```>
```

> (Here I've labeled the tuples for reference.)
```>
```

> Except for the renaming of attributes, every possible pair of original
> tuples now effectively forms a single RC tuple. RC tuples where A = A1
> and C not equal to C1 signify possible contradictions of the constraint.
> However, there is a problem with what follows if we don't eliminate the
> tuples t3, t4 and t7. They are spurious and don't contribute to the
> identification of tuples that contradict the constraint nor ones that obey
> the constraint, so we could ignore them. Tuple t2 is also spurious
> because A <> A1 but it doesn't affect the identification of constraint
> contradictions because C <> C1 in t2. Note that eliminating tuples
> t2,t3,t4 and t7 is the same as including only the tuples where A = A1.
```>
```

> Neither the A-algebra nor this modified version offer any syntax for
> identifying equal tuples or triples. If we are to use this algebra for
> this purpose we need to operate it using only the equality that is
> inferable, namely equality of relation equations and extension values.
```>
```

> Now form RD2 = R{A} <DUP> (A, A1).
> RD2 has the extension value:
> A C
> i1: 1 1
> i2: 2 1
> i3: 2 2
```>
```

> Tuples i1 and i3 contain values such that A = A1. Tuple i2 doesn't.
```>
```

> Let me introduce a shorthand for exclusive OR, ie., <XOR>:
```>
```

> X <XOR> Y = ((<NOT> X) <AND> Y) <OR> (X <AND> (<NOT> Y))
```>
```

> Equality of X and Y can be expressed as <NOT> (X <XOR> Y), as a truth
> table will show. Let RD3 = RD2 <AND> (<NOT> (RD2{A} <XOR> RD2{A1})). RD3
> has the extension value:
```>

>     A A1
> i1: 1  1
> i3: 2  2
```
> i4: 3 3
> i5: 4 4
> ... et cetera, limited only by the range of the type values of A and A1.
```>
```

> Similarly, with the <DUP> operations we can also produce RD4 that has the
> extension value:
```>

>     C C1
> i1: 1  1
> i3: 2  2
```
> i4: 3 3
> i5: 4 4
> ... et cetera, as above.
```>
```

> Now eliminate spurious tuples from RC, giving RX = RC <AND> RD3. RX has
> the extension value:
```>

>     A  C A1 C1
> t1: 1  1  1  1
> t5: 2  1  2  1
> t6: 2  1  2  2
```
> t8: 2 2 2 1
> t9: 2 2 2 2
```>
```

> Now produce RXC = RX <AND> (<NOT> RD4). RXC has the extension value:
```>

>     A  C A1 C1
```
> t6: 2 1 2 2
> t8: 2 2 2 1
```>
```

> RXC has values in its {A,C} projection for tuples that contradict the
> constraint. So, a dbms could apply some equivalent of the above equations
> to determine such tuples or with other equations, determine their relative
> complement, tuples that agree with the constraint.
```>
```

> I suspect that if these equations work for this example, they will work
> for all cases. Although they are long and tedious to follow, I think they
> do have the advantage of relying only on algebraic rules for symbolic
> manipulation, eg., they don't bring "COUNT" into the 'equation', as it
> were, and the manipulations are eminently suitable for digital computers
> to effect.
```>
```

> Perhaps somebody can come up with a proof. In the meantime, I hope this
> will help dispel the claim that there are some 'model' concepts that can't
> be expressed with an algebra or calculus. Those claims seem always to
> involve notions that the algebra has nothing to do with, so I see them as
> being like the claim that the RM is not good enough in the real world
> because SQL has problems in the real world.
>

First of all, I wasn't claiming that inclusion dependencies or functional dependencies couldn't be expressed as equations in the algebra. I simply hadn't seen it done before.

Original text of this message