Re: Relation Schemata vs. Relation Variables

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 04 Sep 2006 13:03:42 GMT
Message-ID: <OuVKg.8075$q63.5940_at_newssvr13.news.prodigy.com>


"Jon Heggland" <jon.heggland_at_idi.ntnu.no> wrote in message news:edfcr6$uvq$1_at_orkan.itea.ntnu.no...
> Brian Selzer wrote:
>> "Jon Heggland" <jon.heggland_at_idi.ntnu.no> wrote in message
>> news:edecf7$diu$1_at_orkan.itea.ntnu.no...
>>> You can pair them in any way that makes sense for the constraint at
>>> hand, by specifying how the pairing is to be performed. Tuple identity
>>> is a red herring; it is neither needed nor wanted.
>>
>> This is a contradiction. A transition constraint must either be able to
>> pair the elements of two relations for comparison or be given a set of
>> pairs
>> to compare.
>
> No, it just needs two relation values to compare. There's no
> contradiction. What is the problem?
>

How does it compare them? JOIN? JOIN pairs tuples based on a candidate key. If you're comparing aggregates, then I guess you could get away without pairing tuples.

>> particular change, but without tuple identifiers it's possible for all
>> keys
>> to change at the same time. Do you treat such an occurance as a
>> delete...insert?
>
> It is equivalent to to a delete-insert performed as a multiple
> assignment (i.e. without constraint checking in the middle), yes.
>
>> Just as there may be constraints that apply only to tuples
>> that are about to be removed, there may be constraints that apply only to
>> tuples that are about to be added. How can the system know whether or
>> not a
>> particular transition constraint applies?
>
> That is of course given by how the TC is formulated. With relvar-based
> constraints there isn't any classification into insert, update and
> delete constraints. There is just a condition (usually) involving an old
> relation value, and a new one. It really isn't so complicated.
>
>> What I'm saying is that using a candidate key value beyond a single
>> database state, such as in R JOIN R', implies that that value carries
>> with
>> it additional meaning beyond what is imparted to it by its definition.
>> When
>> you specify a candidate key, you're attaching significance to a
>> particular
>> set of attributes, but the significance of a particular set of attribute
>> values is limited to a single database instance by the definition of a
>> candidate key. Therefore, using a set of attribute values from one state
>> to
>> identify a tuple in another implies that that significance exceeds the
>> bounds of a single database state.
>
> I have a hard time making sense of the above. Are you saying that a
> given tuple may not mean the same thing in successive database states?
>

Not necessarily. That's certainly a possibility. How a tuple is interpreted is up to the user, not the model. On the other hand, the values determined by a candidate key value in one database state can certainly be different than the values determined by the same candidate key value in the next. So, in one sense, the "meaning" of a candidate key value can be different in successive database states. What complicates matters is that keys can change. If you define a constraint using R and R', and the key changes, then a tuple in R has the old key value and a tuple in R' has the new key value, and R JOIN R' will not include the change. Worse, if two tuples are affected by a change and the key values are swapped, then R JOIN R' will contain both, but not as might have been intended.

>> I'm not saying that that is wrong, but
>> the model must be able to support *all* situations, and "key updates" and
>> R
>> JOIN R' just don't mix.
>
> Why not? What is the problem? Earlier, I gave an example that showed how
> relvar-based TCs handle key updates *better* than tuple-based TCs. How
> does your scheme fare in that example? Have you given any concrete
> examples of how your TCs are formulated?
>

That's a sticky one. It's clear from the update statement that salary wasn't targeted by the update, but the overall effect of the update caused employee #1's salary to decrease. That begs the question: should an indirect change (in one sense a side-effect) cause an update to fail? Since additional information wasn't provided in this case, it can be safely assumed that the update in question is just a "name" change, that is, ID with value 0 in the current state refers to the same employee that ID with value 1 in the proposed state does, so

IS_EMPTY(Ru WHERE Salary' < Salary)

suffices. However, if ID also participates in a foreign key to some employee table, such as

S {ID, LAST, FIRST} then whether or not the update were just a "name" change could be verified:

IS_EMPTY(Ru SEMIJOIN Su WHERE ID' <> ID AND Salary' < Salary) AND IS_EMPTY((Ru SEMIMINUS Su) {ID, Salary}

      JOIN ((Ru SEMIMINUS Su) {ID', Salary'} RENAME ID' AS ID) WHERE Salary' < Salary)
AND IS_EMPTY((Rd JOIN EXTEND Ri ADD (ID' AS ID))

      SEMIMINUS Su WHERE Salary' < Salary) AND IS_EMPTY((Rd JOIN EXTEND Ri ADD (ID' AS ID))

      SEMIJOIN Su WHERE ID' <> ID AND Salary' < Salary)

I think the above is a correct formulation. The first IS_EMPTY verifies that Salary hasn't been lowered along with a "name" change. The second verifies that Salary hasn't been lowered as a side effect of switching IDs. The third verifies that Salary hasn't been lowered as the result of an assignment that targets only R. The fourth verifies that Salary hasn't been lowered as a result of an assignment to R along with a "name" change.

The above constraint may appear complicated, but it accomplishes much more than one that only joins R and R'. In your solution it appears that you expect that a value for ID means the same thing in successive database states, that is, it represents the same employee. (At least that's what I gathered from your comments.) Because "key updates" are allowed, that just isn't always the case.

>>>> There most certainly is! If you define a transition constraint that
>>>> states:
>>>> you can't delete a tuple if quantity is greater than 0, then splitting
>>>> such
>>>> an update into delete...insert must necessarily fail if any tuple has
>>>> quantity greater than zero!
>>> Only if you check the constraint *between* performing the delete and the
>>> insert. And that would be a logical mistake.
>>
>> Only if the system knows that the particular modification is an update.
>> With assignment, it can't be determined what kind of modification was
>> intended.
>
> Please. If I perform an assignment, and the DBMS implements it as a
> delete followed by an insert, and checks my TCs in the middle, it is
> clearly faulty. It should check my TCs after the entire statement has
> been performed. This is rather elementary.
>
>>> The transition should be rejected (of course depending on the actual TC
>>> formulation, which you unfortunately don't provide). If you want the
>>> database to know that Lynn left the line and then reentered, you must
>>> tell it so by performing two updates: a DELETE first, and an INSERT
>>> afterwards (or the equivalent relational assignments, of course). The
>>> hypothetical TCs would permit this.
>>>
>>> You do have a small point: that there may be multiple "paths" between
>>> two database states, and some of these paths may violate TCs and others
>>> not. Thus, you may need to tell the database which path corresponds to
>>> reality (or whatever it is you're representing). But as I show above,
>>> that is possible---even simple---without resorting to your transition
>>> sets. Your scheme adds complexity, but no power.
>>
>> It is possible only if you assume that keys cannot change.
>
> I assume no such thing. What, precisely, is the problem with TTM-style
> TCs and relational assignment in your example? Surely you brought it up
> to prove a point?
>
>> In order to
>> reach the level of stability required to define transition constraints
>> based
>> on R and R' that are *always* enforcible, you must in many cases resort
>> to
>> using object identifiers. Take the example of the line at the bank. For
>> POSITION, there's no need for an object identifier, because the value
>> represented is simply a natural number and it's meaning is intrinsic.
>> For
>> PERSON, however, do you use customer number? What if the customer number
>> changes because the user that added the customer keyed in the wrong
>> information and is now correcting a mistake?
>
> Like I said, the database cannot enforce truth, only consistency. The TC
> in this case would say that *customer numbers* may not jump the line,
> and the DBMS would be quite right in rejecting an update of the customer
> number of a person in a line. It cannot know that the two customer
> numbers represent the same person unless you tell it so.
>

But you can tell it so when you issue an update, but that information is lost in the translation into R'. Consider the following translation of an UPDATE taken from CHAP05.pdf on the TTM web site, Page 5.24-5.25:

UPDATE r ( Ai := X, Aj := Y )

is equivalent to

( ( EXTEND r ADD ( X AS Bi, Y AS Bj ) ) { ALL BUT Ai, Aj } )

                                                          RENAME ( Bi AS Bk, 
Bj AS Aj, Bk AS Ai )

If either Ai or Aj is a prime attribute, then before the projection, the information required to identify tuples in both R and R' exists, but that information is lost once the projection is evaluated.

> You seem to be hung up on the idea that each tuple in a relvar
> corresponds to some real-world object or entity, and that while every
> attribute in the tuple may change, still this mystical connection to the
> entity remains. It is perhaps natural---and even convenient, I'll admit,
> if you want the database to handle TCs about "persons" without
> formalising exactly what a "person" is, though I'll have to see exactly
> how your transition set TCs actually look and work before I pass final
> judgement on that---but it is OID thinking.
>

Key updates are needed so that typos can be corrected. It's as simple as that. If your constraint joins R and R', and keys have changed, then some updates may be ignored by the constraint when a key changes along with what is constrained, and some updates that only correct typos may violate the constraint. So unless you use object identifiers, or tuple identifiers, or immutable, system-generated surrogates instead of keys that may at some point need to be updated, R and R' are not sufficient for defining transition constraints.

Each tuple in a relation represets a fact about something in the domain of discourse. If that's what you mean by "corresponds to some real-world object or entity," then, yes, I guess I am hung up on that idea. The reason is simple: values do not change, but things can; therefore, a transition constraint must be concerned with how things can change, not just how two sets of values are different.

>> Because
>> key stability has no bearing on whether or not a constraint can be
>> enforced,
>> immutable, system-generated surrogate keys, object identifiers, and tuple
>> identifiers are relegated into the physical implementation where they
>> belong.
>
> It is a matter of perspective, but I'd say that this schemes *depends*
> on OIDs, since it seems you want a given tuple to represent the same
> (say) person even when every attribute is changed. You avoid putting the
> OIDs explicitly in the logical model, but they must be present in the
> mind of the user.
>

I wouldn't put it that way. My scheme depends on observation instead of key comparison. By specifying a particular set (or sequence?) of modifications, the user targets the tuples that will be affected and then specifies how each will be affected. The difference is that instead of discarding information that has already been provided by the user, that information is available and can be used to determine whether or not to reject the modification.

> Note, though, that my point is not disparaging your scheme---it seems
> workable, though a bit complex. I just believe your criticism of
> relational assignment and relvar-based TCs is unfounded.
> --
> Jon
Received on Mon Sep 04 2006 - 15:03:42 CEST

Original text of this message