# A different definition of MINUS, Part 3

Date: Wed, 17 Dec 2008 06:09:39 -0800

Message-ID: <OJ72l.852$Us1.502_at_newsfe05.iad>

In the original exchange ( http://www.dbdebunk.com/page/page/1396086.htm

), Date says:

"the operation DELETE SSP WHERE S# = S#('S1') AND P# = P#('P1') ; is inherently unsafe, since we presumably don't know, in general, whether there are any other shipments for supplier S1."

Any time Date says something, I think those of us laymen must pay attention, so let me try and reconcile his comment with my own conclusion. I don't argue with the adverb "inherently" because I came to the same conclusion as far as how I expect most people probably interpret this DELETE statement is concerned. I think I came to this conclusion from a different angle, though, because in Parts One and Two, I tried to be careful to keep any notion of language notions such as assignment or of base or virtual relvars out of the 'algebraic' argument and talk only of projections of results. (One might say I've implicitly included virtual relations in that a projection is virtual as far as tuples are concerned, but my argument is based more on triples than tuples and a projection on some chosen set of attributes loses no triples that those attributes have values for.)

Then he says: "But I never claimed--at least, I don't think I ever claimed!--that the rule for deleting from a join was based purely on logical inference."

In Parts One and Two I tried to argue that if D (the relation implied by the WHERE clause) has ANY triple where S# = 'S1', the A-algebra and the Tutorial D MINUS definition imply that R'{S#} will have no such triple, similarly for R'{P#} where P# = 'P1'. From the equations in Parts One and Two, I think we can also see that a D triple where P# = 'P1', would cause an R'{S#} triple where S# = 'S2' to be 'lost' in the result as well as any R'{P#} triple where P# = 'P2' and therefore lost from the join R' = A <AND> B, and similarly from SSP = S <AND> SP. I think this has nothing to do with any design principle or any implementation rule that says to 'delete from both sides'. If what I've argued so far makes sense, I think the 'rule' "delete from both sides" is purely a logical inference. It is a direct consequence of the MINUS definition applied with the A-algebra. (I know that it can be dangerous to disagree with Date, but that's where my approach leads me, apologies to him if I've made a mis-step.)

Whether people see this consequence as "unsafe" is psychological as far as I'm concerned. In other words, my conclusion is that there is an algebraic interpretation of this DELETE statement that leads logically to results that most people probably don't expect. In this case, the language keyword "AND" is present and yet the triples that are omitted from the result involve a notion of "OR".

To my mind, the issue here is really a language issue, not a model issue. I think it all depends on the definition we choose for MINUS and when we choose a definition we need to consider what will result when the algebraic operands in the definition don't have the same headings. Let me try a different definition that tries to reflect the intention of deleting only those tuples that have both triples standing for S# = 'S1' and P# = 'P1'. To keep it as simple as possible, I'll consider only the situation where the D header is a subset of the R (view) header. What I think we want in D is a header that is equal to R's header and includes only the tuples that have both of these triples.

If any relation 'r' is given an additional attribute 'a' that has all possible values from the domain or type of 'a', we can say that every tuple in 'r' is implied by the projection r+{Hr} of r{Hr,a}. If we imagine some relation 'X' that has the 'a' attribute, we can express all possible 'a' values algebraically as X <OR> <NOT> X. Let me apply this to D by augmenting it with all possible triples containing the HR attributes that aren't in D and call it D+:

D+ = ((R <OR> <NOT> R) <AND> D)

The complement of D+ will have tuples that have both triples standing for, say, S# = 'S2' and P# = 'P1', or, say, for both of S# = 'S1' and P# = 'P2' and for both of S# = 'S2' and P# = 'P2' and so on. If such pairs of triples are in tuples of R, they will remain in the result R', so my definition might be:

R MINUS D is semantically equivalent to the A expression R <AND> <NOT> D+, ie., R <AND> <NOT> ((R <OR> <NOT> R) <AND> D)

Putting that condition in the form of an equation:

R' = R <AND> <NOT> D+

- R <AND> <NOT> ((R <OR> <NOT> R) <AND> D)

Now let me substitute A <AND> B for R:

R' = A <AND> B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) )

<AND> D )

The parentheses here are important, but otherwise I'm still assuming the precedence and associativity I assumed in Part One. Still we can distribute the long expression on the right that stands for the complement of D+ over A and B:

R' = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D ))

<AND> (B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D ))

R' = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D )) {HA}

<AND> (B <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D)) {HA}

so that the first line can be re-written as

A'{HA} = (A <AND> <NOT> (( (A <AND> B) <OR> <NOT> (A <AND> B) ) <AND> D )) {HA}

- (A <AND> <NOT> D+) {HA} and similarly for the third line.

This equation is hard to test manually, but I can try a couple of simple examples.

Assume attributes named a,b,c,d having the domain {1,2} and relations A and B with these values:

A:

a c

1 1

1 2

2 1

B:

a b d

1 1 2

1 2 1

A <AND> B: /* ie. the value of R */

a b c d

1 1 1 1

1 1 1 2

1 2 2 1

D:

a b

1 1

(A <AND> B) <OR> <NOT> (A <AND> B)

a b c d

1 1 1 1

1 1 1 2

1 1 2 1

1 1 2 2

1 2 1 1

1 2 1 2

1 2 2 1

1 2 2 2

2 1 1 1

2 1 1 2

2 1 2 1

2 1 2 2

2 2 1 1

2 2 1 2

2 2 2 1

2 2 2 2

D+:

a b c d

1 1 1 1

1 1 1 2

1 1 2 1

1 1 2 2

<NOT> D+:

a b c d

1 2 1 1

1 2 1 2

1 2 2 1

1 2 2 2

2 1 1 1

2 1 1 2

2 1 2 1

2 1 2 2

2 2 1 1

2 2 1 2

2 2 2 1

2 2 2 2

R':

a b c d

1 2 2 1

Note that only the tuples where a = 1 AND b = 1 have been 'deleted', not the one where a = 1 OR b = 1, this gives a similar view result to the TD example DELETE statement's result.

R'{HA} = (A <AND> <NOT> D+) {HA}

a c

1 2

2 1

Note that NO tuples in A have been 'dropped', which is not similar to the example DELETE statement's result. I think this might make sense to most people on the casual grounds that A never had any tuple with both triples standing for a = 1 AND b = 1.

R'{HB} = (B <AND> <NOT> D+) {HB}

a b d

1 2 1

Note that only one tuple in B has been dropped, the one that has the same two triples that D has.

If I were to use this different definition of MINUS, if only for the case where HD is a proper subset of HA union HB, I'd want some kind of formal exposition that allowed me to predict all possible results, including ones for empty heading. As for cases where HD overlaps with HA and HB in other ways or doesn't overlap at all, that would be even more desireable. Not to mention an implementation that allows easier validation of expectations compared to results.

In the meantime, let me show one other simpler example involving a different sort of heading overlap, the degenerate Cartesian Product, which I think might be important as far as the Principle of Interchangeability ('POI') is concerned. Suppose relation X has attribute x with domain (1,2} and Y has attribute x with domain {1,2}, eg.,

X:

x

1

2

Y:

y

1

R = X <AND> Y (Cartesian Product):

x y

1 1

2 1

D:

x y

1 1

D+ = (R <OR> <NOT> R) <AND> D:

x y

1 1

<NOT> D+:

x y

1 2

2 1

2 2

R'{HR} = R{HR} <AND> <NOT> D+:

x y

2 1

In the result for R', the tuple that has both of the D triples, ie. x = 1 and y = 1, has been dropped, which seems compatible with the POI.

X'{HX} = (X <AND> <NOT> D+) {HX}:

x

1

2

But the X' result has dropped no tuples.

Y'{HY} = (Y <AND> <NOT> D+) {HY}:

y

1

If we produce R' from X' <AND> Y', R' = R!

Neither has the Y' result dropped any tuples and we have a situation where R = X <AND> Y, but R' is NOT equal to X' <AND> Y', in other words DELETE R WHERE X = 1 AND Y = 1 does nothing! I think the casual explanation for this result is sort of patent - neither X nor Y can ever have a tuple with two triples. Somewhere Date mentions to the effect that the POI means there should be "no arbitrary or unnecessary" differences between base and virtual relvars. From a purely (I hope) algebraic standpoint, it looks to me that the different MINUS definition must imply a difference. Personally, I'd say it is a necessary difference, not an arbitrary one.

This is why I think McGoveran might be right if he is saying that there are times when a user must be aware of virtual relvars in order to appreciate dbms results. Received on Wed Dec 17 2008 - 15:09:39 CET