A different definition of MINUS, part 4

From: paul c <toledobythesea_at_oohay.ac>
Date: Fri, 26 Dec 2008 11:37:37 -0800
Message-ID: <Zma5l.8523$fc3.1985_at_newsfe02.iad>

Parts one, two and three have various mis-steps and confusions. Let me start again emphasizing what I think was McGoveran's paramount point - the desire for logical data independence. I interpret the meaning of this in an algebra to be that substituting some non-logical aspect of an implementation for another will not require any logical aspects to be removed, ie., any true statements to be falsified. Base and virtual relvars are non-logical aspects in that there is no notion of them in the algebra or calculus. In Codd's logic, ie. his algebra and calculus, they are no different than colours, for example it does not matter in the D&D A-algebra whether a relation is 'red' or 'green'. This does not mean that an implementation cannot be logical, only that if its results are to be validated logically, it must define all the concepts involved in those results in logical, eg., algebraic, terms.

Suppose R is virtual, A and B are base and R is defined as A JOIN B:

R = A <AND> B is true.

Now, make R base and A and B virtual (logical independence should allow this):

A = R{HA} is true.
B = R{HB} is true.

(HA and HB are the respective headings of A and B.) To have logical
data independence in any implementation, the first equation must remain true. If we need to falsify any other true equation as a consequence of making R base, we don't have logical data independence. Logical independence implies that A cannot have a tuple that is not in R{HA}.

To say that a 'delete' can have indeterminate base results is to permit A to have a tuple that is not in R{HA} which makes one of the equations false. Saying this denies logical independence as I see it. I think one implication of what McGoveran is saying is that certain database values are not logically possible when logical independence is assumed and possibly this has to do with the motivation of those who want to disallow delete through join. Maybe they also are saying that we can have logical independence sometimes but not always. If so, I think this brings into question whether dbms results can ever be validated logically.

Somebody might object that making R base as above is not always possible because a user might have inserted a tuple into an 'A' relvar but not into a 'B' relvar. My answer to that is that it would always be logically possible if all three equations were always obeyed, ie., by logical independence, the values for R, A and B must always be such as they would be if all inserted tuples had been inserted into R.

Part of Codd's early reason for wanting what he called data independence was surely the hierarchical db's of the time, where programs changed frequently in very adhoc ways as a consequence of physical data re-arrangements. But even if today there were no more hierarchical db's, logical independence would still be needed at the very least for optimizations.

Given the above example, let me assume logical independence so that A = R{HA} and B = R{HB} will always be true and a result R' = A' JOIN B' where R', A', B' are result relvars, or 'after' values if you like, is also true. After a successful delete by an implementation, A' must have no tuple that is not in R'{HA}, similarly for B'. However, this does not imply that all deletes must 'delete from both sides'. It only means that when HA = HB, A = B. Much depends on language design choices, such as the definition of 'MINUS'. Also, I have no doubt that the Tutorial D language is indeterminate for delete through join and I think that is a result of the MINUS definition TD uses.

If we want these three equations to be always true as far as MINUS is concerned, it might help if we somehow reflect them in a definition that gives some clue as to how an implementation can proceed. One clue was spelled out by McGoveran in the exchange, treat the view definition as the general case. Another is implied by the second two equations, namely the equality of projections. Equality of relations or projections can be specified by negating the result of logical exclusive-or, eg., in A-algebra terms: <NOT> (((<NOT> A) <AND> B) <OR>
(A <AND> (<NOT> B))). Let me just write this as A <NXOR> B, so R = A
<NXOR> B. Now I can try to express those clues in a different MINUS definition:

R MINUS D is semantically equivalent to:

((R{HR1} <NXOR> R{HR2}) <AND> (<NOT> D)){HR}

where (ie., 'it is required that') HR is the heading of R and
HR1 union HR2 = HR,

 From the definition of <NXOR> it is clear that an implementation's choices do not include deleting a tuple from R{HR1} if it does not also delete that tuple from R{HR2} when HR1=HR2. However, if HR1 <> HR2, an implementation is not prevented by this particular definition from deleting a tuple only from R{HR1}, similarly only from R{HR2}.

I think this definition needs some more work, but I'll take a breath for now. Received on Fri Dec 26 2008 - 20:37:37 CET

Original text of this message