Re: algebra equations for Reference and FD constraints

From: Brian Selzer <brian_at_selzer-software.com>
Date: Wed, 24 Dec 2008 06:48:30 -0500
Message-ID: <jkp4l.11406$c45.412_at_nlpi065.nbdc.sbc.com>

"paul c" <toledobythesea_at_oohay.ac> wrote in message news:z3f4l.1644$mE3.1114_at_newsfe14.iad...
> Cimode wrote:
> ...

>>>> 1. Database constraints are equations, and this generalization is a
>>>> natural way to encompass them.
>>> That's an interesting take.  I'm assuming that these equations can be
>>> expressed in the algebra.  Supposing that you have relation schemata 
>>> R{A, B,
>>> C} and S{A, D}. How would you express an interrelational constraint, 
>>> such as
>>> the inclusion dependency,
>>>
>>> R[A] IN S[A]
>> Date has clarified this aspect as subtyping.
>> ...
>

> These question and the one below rear up every so often. Because I think
> they are important examples of constraints, I'm replying with a new
> subject line. (Along with my very best wishes of the season to those who
> think newsgroup messages are inherently hierarchical, as well as to all
> other mystics, as well as the small remaining minority!)

> I don't know that Date has compared an inclusion dependency to subtyping
> but regardless, in the A-algebra, it is pretty simple to define an
> inclusion dependency (although I think of it as a Reference constraint, a
> more general case of a foreign key constraint).

> If a dbms produces a result R' that is required to obey the constraint,
> the A-algebra can express the constraint as R'{A} = R'{A} <AND> S{A}.
> There will be no value for A in any R tuple that is not an A value in some
> S tuple. This equation indicates what the dbms must do: apply its
> equivalent of the <AND> operator to the projections of the potential
> result R' and of S and verify that the result equals the projection R'{A}.
> Note that as for pretty much relation constraints, the application doesn't
> have to be verbatim, many optimizations are possible.

> Note that whether the dbms takes some exceptional action if the constraint
> isn't satisfied or merely eliminates offending tuples in the result is not
> something that the algebra dictates. By taking an algebraic difference it
> is possible for the dbms to also identify offending tuples.

> (That wasn't strict A-algebra syntax, since I used braces for projection
> instead of the <REMOVE> operator, but this is only a syntactic
> difference.)

>
>>> as an equation using the algebra?  Or for that matter, how would you 
>>> express
>>> the functional dependency,
>>>
>>> AB --> C
>>>
>>> as an equation using the algebra?
>> Truth tables are easy to set up to validate FD in ra and FOPC.
>> Validating each fact can be easily formalized in ra.
>> ...
>

> I believe FD's can also be expressed algebraically, provided we use a
> slightly different definition of the A-algebra <RENAME> operation, or with
> some equally effective change to another algebra. Following is a
> demonstration of how the algebra could express the constraint and
> therefore allow a dbms to show behaviour that is formally defined. If this
> is more long-winded than necessary I hope somebody will shorten it.

> The reason for adjusting <RENAME> has to do with the inability of
> A-algebra syntax to form a relation R'{a,a1} from R{a}, such that each
> tuple has two triples with the same 'v' value.

> One way to get that ability is to replace the <RENAME> operator with, say,
> a <DUP> operator whose definition varies like so - the line:

>


> Hs = ( Hr minus { <A,T> } ) union { <B,T> } becomes


> Hs =   Hr                   union { <B,T> }

>

> and the two lines:

>


> ts = ( tr minus { <A,T,v> } )


>           union { <B,T,v> } ) } become

>

> ts = ( tr union { <B,T,v> } ) }

> (This might be 'more primitive' than <REMOVE> since it doesn't 'project'
> via 'minus' and <REMOVE> could then become a syntactic 'macro', ie.
> shorthand, but those possibilities don't matter for this answer, so I'll
> keep the <RENAME> definition along with the additional <DUP> definition.)

> The same basic approach can be used for the FD A -> C as for AB -> C, so
> for now I'll stick with the former for easier readability.

> Say R=
> A C
> 1 1
> 2 1
> 2 1

> The 'first' 'tuple' obeys the constraint but the 'second' two don't. Form
> an equivalent of the Cartesian Product of R like so:

> RD = ((R <DUP> (A,A1)) <DUP> (C, C1)){A1,C1)
> RC = R <AND> RD

> RD has the extension value:

> A1 C1
> 1 1
> 2 1
> 2 1

>


> RC has the extension value:

>     A  C A1 C1

> t1: 1  1  1  1

> t2: 1  1  2  1


> t3: 1  1  2  2

> t4: 2  1  1  1

> t5: 2  1  2  1

> t6: 2  1  2  2

> t7: 2  2  1  1

> t8: 2 2 2 1
> t9: 2 2 2 2

> (Here I've labeled the tuples for reference.)

> Except for the renaming of attributes, every possible pair of original
> tuples now effectively forms a single RC tuple. RC tuples where A = A1
> and C not equal to C1 signify possible contradictions of the constraint.
> However, there is a problem with what follows if we don't eliminate the
> tuples t3, t4 and t7. They are spurious and don't contribute to the
> identification of tuples that contradict the constraint nor ones that obey
> the constraint, so we could ignore them. Tuple t2 is also spurious
> because A <> A1 but it doesn't affect the identification of constraint
> contradictions because C <> C1 in t2. Note that eliminating tuples
> t2,t3,t4 and t7 is the same as including only the tuples where A = A1.

> Neither the A-algebra nor this modified version offer any syntax for
> identifying equal tuples or triples. If we are to use this algebra for
> this purpose we need to operate it using only the equality that is
> inferable, namely equality of relation equations and extension values.

> Now form RD2 = R{A} <DUP> (A, A1).
> RD2 has the extension value:
> A C
> i1: 1 1
> i2: 2 1
> i3: 2 2

> Tuples i1 and i3 contain values such that A = A1. Tuple i2 doesn't.

> Let me introduce a shorthand for exclusive OR, ie., <XOR>:

> X <XOR> Y = ((<NOT> X) <AND> Y) <OR> (X <AND> (<NOT> Y))

> Equality of X and Y can be expressed as <NOT> (X <XOR> Y), as a truth
> table will show. Let RD3 = RD2 <AND> (<NOT> (RD2{A} <XOR> RD2{A1})). RD3
> has the extension value:

>


>     A A1

> i1: 1  1

> i3: 2  2

> i4: 3 3
> i5: 4 4
> ... et cetera, limited only by the range of the type values of A and A1.

>


>     C C1

> i1: 1  1

> i3: 2  2

> i4: 3 3
> i5: 4 4
> ... et cetera, as above.

> Now eliminate spurious tuples from RC, giving RX = RC <AND> RD3. RX has
> the extension value:

>


>     A  C A1 C1

> t1: 1  1  1  1

> t5: 2  1  2  1

> t6: 2  1  2  2

> t8: 2 2 2 1
> t9: 2 2 2 2

> Now produce RXC = RX <AND> (<NOT> RD4). RXC has the extension value:

>


>     A  C A1 C1

> t6: 2 1 2 2
> t8: 2 2 2 1

> RXC has values in its {A,C} projection for tuples that contradict the
> constraint. So, a dbms could apply some equivalent of the above equations
> to determine such tuples or with other equations, determine their relative
> complement, tuples that agree with the constraint.

> I suspect that if these equations work for this example, they will work
> for all cases. Although they are long and tedious to follow, I think they
> do have the advantage of relying only on algebraic rules for symbolic
> manipulation, eg., they don't bring "COUNT" into the 'equation', as it
> were, and the manipulations are eminently suitable for digital computers
> to effect.

> Perhaps somebody can come up with a proof. In the meantime, I hope this
> will help dispel the claim that there are some 'model' concepts that can't
> be expressed with an algebra or calculus. Those claims seem always to
> involve notions that the algebra has nothing to do with, so I see them as
> being like the claim that the RM is not good enough in the real world
> because SQL has problems in the real world.
>

First of all, I wasn't claiming that inclusion dependencies or functional dependencies couldn't be expressed as equations in the algebra. I simply hadn't seen it done before.

Second, this does not dispel the claim that there are some 'model' concepts that can't be expressed with the algebra or calculus. In particular, database updates cannot be expressed. To be sure, a value that is to be assigned can, but the update itself--the actual assignment--cannot. Nor should it. Received on Wed Dec 24 2008 - 12:48:30 CET

This message: [ Message body ]
Next message: paul c: "Re: algebra equations for Reference and FD constraints"
Previous message: Cimode: "Re: algebra equations for Reference and FD constraints"
In reply to paul c: "algebra equations for Reference and FD constraints"
Next in thread: paul c: "Re: algebra equations for Reference and FD constraints"
Reply: paul c: "Re: algebra equations for Reference and FD constraints"
Reply: Brian Selzer: "Re: algebra equations for Reference and FD constraints"
Reply: paul c: "Re: algebra equations for Reference and FD constraints"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message