Re: Relation Schemata vs. Relation Variables

From: Jon Heggland <jon.heggland_at_idi.ntnu.no>
Date: Fri, 08 Sep 2006 10:38:56 +0200
Message-ID: <edra34$cem$1_at_orkan.itea.ntnu.no>


Brian Selzer wrote:

>> No, it just needs two relation values to compare. There's no
>> contradiction. What is the problem?

>
> How does it compare them? JOIN? JOIN pairs tuples based on a candidate
> key.

It most definitely does not! It pairs tuples based on common attribute names (or a user-specified predicate, if you are talking about theta-join instead of natural join). By the way, I have refrained from noting that your expression "R JOIN R'" is an intersection, and does not make much sense as part of a TC, because I have assumed you only use the expression to refer to relvar-based TCs---but if you're confused about JOIN, perhaps it's best to be explicit about this as well.

>> I have a hard time making sense of the above. Are you saying that a
>> given tuple may not mean the same thing in successive database states?
>>

>
> Not necessarily. That's certainly a possibility. How a tuple is
> interpreted is up to the user, not the model. On the other hand, the values
> determined by a candidate key value in one database state can certainly be
> different than the values determined by the same candidate key value in the
> next. So, in one sense, the "meaning" of a candidate key value can be
> different in successive database states.

So when you say "meaning of a candidate key", you mean more or less the same as "interpretation of the tuple containing the key"? This is still much too fuzzy for me to make any sense of.

> What complicates matters is that
> keys can change. If you define a constraint using R and R', and the key
> changes, then a tuple in R has the old key value and a tuple in R' has the
> new key value, and R JOIN R' will not include the change.

See above about "R JOIN R'". What precisely do you mean by "include the change"? Furthermore, join expressions isn't the only way to make transition constraints. Constraints on insert and delete would probably use semiminus instead, I think.

> Worse, if two
> tuples are affected by a change and the key values are swapped, then R JOIN
> R' will contain both, but not as might have been intended.

It is this "intention" business I am suspicious of. That two database modification with exactly the same result should be treated differently based on the formulation of the update statement(s) may be considered a feature, but also a bug.

>> Earlier, I gave an example that showed how
>> relvar-based TCs handle key updates *better* than tuple-based TCs. How
>> does your scheme fare in that example? Have you given any concrete
>> examples of how your TCs are formulated?

>
> That's a sticky one. It's clear from the update statement that salary
> wasn't targeted by the update, but the overall effect of the update caused
> employee #1's salary to decrease. That begs the question: should an
> indirect change (in one sense a side-effect) cause an update to fail? Since
> additional information wasn't provided in this case, it can be safely
> assumed that the update in question is just a "name" change, that is, ID
> with value 0 in the current state refers to the same employee that ID with
> value 1 in the proposed state does, so
>
> IS_EMPTY(Ru WHERE Salary' < Salary)
>
> suffices.

So you would allow the update. Doesn't that mean that sneaky employees can circumvent the policy of non-decreasing salaries by pretending they're "renaming" when they're not? Or does this scheme require attribute-level access control, i.e. peon users may be allowed to update the Salary attribute, but not the ID attribute? I think I prefer the simpler and more conservative approach, even if it means I may have to temporarily drop a TC in order to correct a key typo.

> However, if ID also participates in a foreign key to some
> employee table, such as
>
> S {ID, LAST, FIRST}
>
> then whether or not the update were just a "name" change could be verified:

Does this "name" change consideration always work? What if you have a relvar D { BID, GID } representing which employee is (at a given time) dancing with which at the annual company ball: There are two keys, { BID } and { GID }. When you update this one, how can you tell the difference between "name" changes and partner changes? Are foreign keys always required?

> IS_EMPTY(Ru SEMIJOIN Su WHERE ID' <> ID AND Salary' < Salary)
> AND IS_EMPTY((Ru SEMIMINUS Su) {ID, Salary}
> JOIN ((Ru SEMIMINUS Su) {ID', Salary'} RENAME ID' AS ID) WHERE Salary'
> < Salary)
> AND IS_EMPTY((Rd JOIN EXTEND Ri ADD (ID' AS ID))
> SEMIMINUS Su WHERE Salary' < Salary)
> AND IS_EMPTY((Rd JOIN EXTEND Ri ADD (ID' AS ID))
> SEMIJOIN Su WHERE ID' <> ID AND Salary' < Salary)
>
> I think the above is a correct formulation. The first IS_EMPTY verifies
> that Salary hasn't been lowered along with a "name" change. The second
> verifies that Salary hasn't been lowered as a side effect of switching IDs.
> The third verifies that Salary hasn't been lowered as the result of an
> assignment that targets only R. The fourth verifies that Salary hasn't been
> lowered as a result of an assignment to R along with a "name" change.
>
> The above constraint may appear complicated, but it accomplishes much more
> than one that only joins R and R'.

What if you have more than one foreign key? Will you consider it a "rename" if at least one FK is updated? Or only if all are? You are going to need one huge TC in that case, it seems.

> In your solution it appears that you
> expect that a value for ID means the same thing in successive database
> states, that is, it represents the same employee. (At least that's what I
> gathered from your comments.)

You could say that. Again, truth/consistency. If you only speak to the database about employee numbers, it cannot know about persons. I think this is safer than expecting a certain tuple to represent the same employee in successive database states, even if every attribute is changed. This requires a tuple id, at least conceptually.

That said, I'll concede that your scheme may have an advantage in the case where you want to change the employee number of an employee in a line without violating the no-jumping TC. But I don't think that's worth the complexity and the pitfalls.

>> Like I said, the database cannot enforce truth, only consistency. The TC
>> in this case would say that *customer numbers* may not jump the line,
>> and the DBMS would be quite right in rejecting an update of the customer
>> number of a person in a line. It cannot know that the two customer
>> numbers represent the same person unless you tell it so.

>
> But you can tell it so when you issue an update, but that information is
> lost in the translation into R'.

I mean "tell it so" in the sense of recording it in the database.

> Each tuple in a relation represets a fact about something in the domain of
> discourse.

SomethingS, I'd say---the tuple { 42 ID, 60000 SALARY } is as much a fact about the salary 60000 as it is about the employee 42. That may be a fine point, but your scheme seems to discriminate between attributes based on whether they are keys or not, or perhaps whether they can be "renamed" or not: Presumably an employee can, but a salary cannot. However, in relational theory, there is no difference between them. I think Jan Hidders' remark that such considerations belong to a "more semantic" level (formalised E/R, if there is such a beast) may be apt.

> If that's what you mean by "corresponds to some real-world
> object or entity," then, yes, I guess I am hung up on that idea. The reason
> is simple: values do not change, but things can; therefore, a transition
> constraint must be concerned with how things can change, not just how two
> sets of values are different.

No, I mean you think in terms of tuple variables.

>> It is a matter of perspective, but I'd say that this schemes *depends*
>> on OIDs, since it seems you want a given tuple to represent the same
>> (say) person even when every attribute is changed. You avoid putting the
>> OIDs explicitly in the logical model, but they must be present in the
>> mind of the user.

>
> I wouldn't put it that way. My scheme depends on observation instead of key
> comparison. By specifying a particular set (or sequence?) of modifications,
> the user targets the tuples that will be affected and then specifies how
> each will be affected. The difference is that instead of discarding
> information that has already been provided by the user, that information is
> available and can be used to determine whether or not to reject the
> modification.

Your scheme depends on the particular formulation of an update having an effect on the enforcement of transition constraints. This means that the users must be aware that *how* they perform an update, not just *what* the new database state shall be, makes a significant difference; and that they somehow must be prevented from formulating updates the "wrong" way. And the optimiser may not transform an update into the most efficient form; at least not until after the constraints have been checked.

I don't think it is very practical.

-- 
Jon
Received on Fri Sep 08 2006 - 10:38:56 CEST

Original text of this message