Date and McGoveran comments on view updating 'problem'

From: paul c <toledobythesea_at_oohay.ac>
Date: Mon, 08 Dec 2008 19:24:38 GMT
Message-ID: <Wve%k.3357$yK5.661_at_edtnps82>



A few years ago, the dbdebunk.com site had an exchange at:

http://www.dbdebunk.com/page/page/1396086.htm

which I found very interesting. I haven't seen that exchange mentioned much here but I think it is likely very important. The three "possibilities" Date mentions for his chosen example seem wrong to me, but McGoveran's comments intrigued me more.

The example Date gives is:

DELETE SSP WHERE S# = S#('S1') AND P# = P#('P1') ; To avoid irrelevant nuances from the SQL world that this syntax might suggest, I find it clearer to cast the statement more algebraically as:

SSP' = SSP MINUS D where SSP' stands for the result and D stands for the relation, whatever it is, that we can deduce from the WHERE clause and MINUS is equivalent to <AND> in the formal definitions that Date and Darwen use in TTM
(making this MINUS not so restrictive as their definition of MINUS).

If the closure that McGoveran mentions is possible, then we would expect the same SSP' result value no matter whether SSP', SSP or D are views or base relations, ie., any of the following combinations:

   SSP' SSP D
1 base base base
2 base base view
3 base view base
4 base view view
5 view base base
6 view base view
7 view view base
8 view view view

(I think this is an example of what Codd called logical data
independence, it is like how in ordinary arithmetic we can express a square in many different ways, eg. 18**2 = 324 = 18 * 18 = (17 * 17) + 17 + 18, etc.)

Some people might say that it's patent that D is not a base relation, because Date's statement as written doesn't name it, but it seems to me that logical independence requires that we must consider the possibility that it could be base if suitable syntax were used. To put it another way, inferring that D is a view is reading more into the statement than what is stated.

Regardless of whether D is a view or base, from what the original syntax states, it would seem that D contains only one tuple, if it contained more than one presumably the statement ought to indicate that somehow. If D is a join view, one might think that its base relation or relations could contain tuples no projections of which appear in the join. If that's so, then I think it's clear that the author of the statement must be intending that multiple relations are to be 'deleted', which seems to deny closure, at least closure that entails a single resulting relation.   So it seems to me that if the statement is considered to be 'legal' we must take it as given that D is a single relation.

If D is base and since it has only one tuple, it must be possible to rewrite D as:

D = A JOIN B Date lays his comment out by talking of one-to-many relationships but I'd rather wonder how many tuples or rows could A and B have? On the face of it, each could have more than one. But the original statement doesn't suggest that at all. The tendency to see things in syntax that aren't there seems endemic. I think McGoveran was indicating that in his various points, ie. as far as a symbol manipulating program is concerned, A and B can each have only one tuple/row, otherwise the dbms must expect the statement to state to the contrary. We can't say whether A and B have the same attributes or 'heading', but we do know
(since SSP is a running example of Date's) that {S#, P#} is a key of
SSP. Like all constraints, it must be applied by the dbms before giving results. We know from Heath's Theorem ( see the first question at http://www.bridgeport.edu/sed/fcourses/cs450/Assignments/Selected_Problems_From_Ch11.pdf) that D must be equal to any of its projections that include {S#, P#} so if A and B include the S# and P# attributes, they must have only one row each, otherwise D would have more than one tuple/row. If they don't include both S# and P#, A or B might have more than one row, but if so, that isn't evident in the original DELETE statement so I would say we should not admit that possibility to the problem. Let's assume from here on that A = D{attributes of S} and B = D{attributes of SP}.

We could also rewrite D as:

D = A UNION B In this case, obviously neither A nor B can have more than one row, but one of them could have none. If one of them has no rows, then either D = A or D = B, which in this example is no different than saying D = D and which doesn't change the problem for me.

In the above ways of rewriting D mechanically, the complement of D (eg., <NOT> D) has every tuple that could be constructed from the values that the domains of the attributes allow, in other words, all but one row, for example the projection of that complement on {S#, P#} will contain the tuple <S1, P2> (this seems important to me). If we open up the TTM algebra a bit and don't require a MINUS operator that depends on equal headings for its operands, we can write SSP MINUS D as SSP <AND>
(<NOT>D). By the algebraic laws we could equally rewrite the original
statement as:

SSP'

= (S <AND> SP) <AND> (<NOT> D)
= (S <AND> SP) <AND> (<NOT> (A <AND> B))
= (S <AND> (<NOT> A)) <AND> (SP <AND> (<NOT> B))

(I've tried to keep this long syntax as simple as possible by omitting
key constraints.)

Now to quote Date from that dbdebunk exchange:

"The rules say: Delete the tuple for S1 from relvar S and delete the tuple for S1 and P1 from relvar SP. So what happens? There are three possibilities:

  1. If there are no other shipments for supplier S1, the overall operation succeeds.
  2. If there are some other shipments for supplier S1 and no other updates are done (i.e., there are no side effects), the overall operation fails on a referential integrity violation.
  3. If there are some other shipments for supplier S1 and deleting supplier S1 from relvar S "cascades" to delete those shipments from relvar SP, the overall operation fails on a violation of The Assignment Principle."

Date seems to be starting with the rule and then examining the practical effects. I would rather start with the (mechanical) algebra, so let me ignore the rule and see what happens: The first of the three possibilities goes away because I've thrown the rule away (ie., I'm not assuming up front that Supplier S1 must be deleted from S). I would like to throw the second away on the grounds of closure, ie., that if closure is possible, there are no logical exceptions, only results in the form of relations (if that's possible, it would be an improvement on arithmetic where division by zero is an exception but this doesn't prevent a dbms from raising exceptions for psychological reasons). As for the third, and as I suggested above, the logical complement of A in my rewriting will include the tuple <S1, P2, ...>, which means when it is <AND>'ed with S, Supplier S1 won't be deleted from S, thus no "other shipments" would be deleted from SP. I don't see that Date's Assignment Principle is violated at all.

In the exchange, I found McGoveran's comment about 'relative complement' intriguing. In this example, I'm guessing it means that the rule of deleting from 'both sides' is wrong, also that characterizing the base relations to be updated according to entity-relationship terms is not very useful. Whereas I think an algebraic exposition must embody whatever concepts are necessary. I would like to be able to prove that there is no way to mechanically rewrite (by 'mechanically', I mean without introducing meaning that isn't explicit) the original statement that gives any other result, but my logical equipment isn't capable or proving that kind of meta-conclusion one way or another. (All I've done in this long-winded post is fasten on one particular re-writing.) But if that could be done, it would give me confidence that updates of joins and probably unions too is logically possible, not only that, but logically entailed. Received on Mon Dec 08 2008 - 20:24:38 CET

Original text of this message