A different definition of MINUS, Part 1

From: paul c <toledobythesea_at_oohay.ac>
Date: Mon, 15 Dec 2008 16:35:16 -0800
Message-ID: <nIC1l.5519$%z5.3741_at_newsfe09.iad>



I'm still on the same view updating theme but I thought I'd make a new subject so as not to sully D&D's stuff with some of my asides under the other thread. I'm usually guilty, on purpose, of aside-overload because some of the smarter people here often clarify my misunderstandings, which is valuable to me. But that thread is now full of stuff that I don't think is relevant to view updating. Secondly, when I said the Tutorial D MINUS definition might not be "good enough", what I really meant to say is that it doesn't go far enough for the usual dbms where a table or relvar has a fixed header. Obviously D&D already knew this because they mention "same heading" in their definition.

As interesting as theory is to me, I'm not much of a theorist (I want to know just enough theory to understand a dbms's results, or maybe just enough more to see a better way to make a dbms) and rather than me write about this, it would be better if the big guns either solved it or proved it can't be solved, but I guess they have other things on their plates. But I'm impatient because I'm with McGoveran that this issue is crucial to the RM, otherwise we're effectively saying that some results are logical and some are arbitrary.

This is a pretty long post and I wish I were clever enough to make it shorter. For ease of reference for anybody who cares to wade through it, I'll divide it into parts one and two, even if that seems a little pompous.



PART ONE Here's the D&D TTM definition:

(quote - page 92, TTM 2nd edition)

Difference: The Tutorial D <minus>

   R1 MINUS R2
(where R1 and R2 have the same heading) is semantically equivalent to
the A expression

   R1 <AND> ( <NOT> R2)

(end quote)

In the original dbdebunk exchange, McGoveran said: "...Thus, we must solve the general case of update semantics for which updating "base" relations is the special case (and not the reverse) ...".

I'm not expert in formal predicates and I find seeing a way to an initial implementation easier with an algebra to follow, so I tried to think about McGoveran's advice in a purely algebraic way. Another thing about the A-algebra specifically: since I first saw it, it has always seemed remarkable to me that an implementation such as Tutorial D might have this view updating problem at the same time as there is no such problem within the algebra, eg., a dbms that implemented only the algebra might mean a db that grows like topsy storing all its results and it might not be especially convenient for users to interpret, but still how could the implementation have the problem and the algebra not have it?

Anyway, here's what I imagine I mean by 'purely algebraic':

  1. The A-algebra says nothing about constraints other than the ones in the formal definition that have to do with relation structure (such as two attributes with the same name in the same relation not having the same type), where other constrains are concerned, it is merely the vehicle for applying the ones that users desire. So, for the purpose of determining results before user constraints are applied (eg., '<AND>ed'), an algebraic approach can ignore the key and foreign constraints mentioned in the exchange.
  2. Date suggests the Assignment Principle is essential, but I would say it is essential only if we use language that has an assignment operator. The S relation that underlies the SSP relvar mentions supplier tuples that either don't participate in the view or that aren't suggested by the WHERE clause. I interpret this to mean that if an S-tuple has triples that aren't all in an SSP tuple, that S-tuple must remain in the result, ie., those tuples are outside of the 'scope' that McGoveran mentions. Below, I try to find an algebraic way to satisfy this. I think the Assignment Principle doesn't mean that an algebra needs an assignment operator, rather it is really a definition in terms of an algebra as to what an assignment operator, if a language has one, means.
  3. As far as the 'Principle of Interchangeability' (users without definition privileges would not be able to tell base relations from views) is concerned, I don't yet see how the A-algebra by itself can enforce it, so it is up for grabs. Closure seems far more important, just as it is in Boolean Algebra. From an algebraic point of view, this also means that there can't be any exceptions except for ones that the formal definition of the algebra mentions. Users might prefer to be warned when certain kinds of results are implied by the algebra, but that is a choice for the language or dbms designer, not something the algebra must cater to. So, for starters, I'd remove that "same heading" proviso in the TD MINUS definition.
  4. A semantic definition seems important, akin to the one for TD MINUS, ie., one that avoids "implementation rules" and expresses a possible naive implementation directly in terms of the algebra.
  5. The distributive and associative properties of the algebra are also important for implementation as they allow a dbms to optimize in various situations.
  6. Projection must be involved (given the A-algebra) if the usual short-hands for eg., WHERE clauses that people find familiar and if the usual fixed headings of most db's are to be supported. Whereas, the A-algebra doesn't care about shorthands nor about fixed headings.

First, let me examine a simpler version of TD's MINUS that removes the "same heading" proviso:

R MINUS D is semantically equivalent to R <AND> <NOT> D.

Whatever the equivalent to an implemented language's WHERE clause this definition might be, users will surely want the leverage, such as SQL's WHERE clause has, to remove multiple tuples by specifying a proper subset of all of a relation's attributes. This means that D needn't specify all R attributes. In the exchange, Fabian Pascal implied that the example WHERE clause is not "sufficiently expressive". I'm not arguing that the clause is sufficient for all purposes, just that it is enough to apply the A-algebra. My approach then is to take the clause as given and decide what it can definitely mean from the information available to an algebra. With respect to the MINUS definition, the WHERE clause refers only to a tuple in D (and not a tuple in R, since R may not have the same attributes as D, as is the case for the SSP relvar). Date goes beyond SQL syntax because not only the attribute names are specified, but also the types, as in S# = S#('S1'), so as far as types are concerned, this example doesn't need to refer to a 'catalog' . As far as the algebraic term D is concerned, all we are given are attribute names, types and values and the D&D 'conjoin', '<AND>' and enough is specified to form triples, tuples and a partial predicate. No relvar or relation name is given, but we can still proceed because the A-algebra allows us to form relations from tuples. Still, I think we can't deal with the definitional problem unless we think about tuples and their triples. As far as the algebraic equation is concerned, the significance of the D-tuple is that it is in D and not specifically stated to be in R.

To avoid maddening levels of parentheses in what follows, let me give highest precedence to <NOT> , next-highest to <AND> and <OR>, next highest to '=' and also assume that associativity is left-to-right.

For my purposes, it seems to me easier and clearer to resort to an algebraic equation. I'm trying to omit extraneous notions, perhaps this is the same as what Codd called 'essentiality', I'm not sure if this is what he had in mind but I am coming to suspect that the usual arguments against such 'deletes' stem from mingling physical and logical notions, eg., that a base must be physical. For one thing, an equation avoids introducing any notion of assignment to the algebra and also avoids relvars, so there's no notion of 'before' and 'after', making it easier, at least for me, to talk about results, ie.,

R' = R MINUS D

  • R{HR} <AND> <NOT> D{HD}

where R' is equivalent to the relation that the MINUS operation produces and HR, HD are the headings of R and D. The projections R{HR} and D{HD} are equal to R and D respectively. Though they may seem redundant they are necessary if I am to see this algebraic equation in the context of the typical dbms that assumes fixed relvar or relation headings.

In this post, I'm concerned only with 'delete through join', not yet 'insert through union' or 'insert through projection'.

If R{HR} is a join, say equal to 'A <AND> B', Heath's Theorem can be used to re-write 'R{HR}', provided the re-writing involves overlapping headers, where HSR is some subset of R's header, ie.:

R{HR} = R{HSR} <AND> R{HR}

(This is because HSR -> HSR is a trivial dependency. Maybe somebody
will point out that I don't need Heath's theorem to re-write, it is just a way that comes to mind that I believe can be shown with only the assumptions in the A-algebra definitions, not any notions beyond that. )

By definition of <AND>, we know that there is an HSR that equals HA, where HA is A's heading, and there is no non-equal HSR we care about for this example, so we can rewrite the R' equation as:

R' = R{HA} <AND> R{HR} <AND> <NOT> D

If we introduce HB as the heading of B, we can also write:

R' = R{HB} <AND> R{HR} <AND> <NOT> D

I think this points the way to a result that avoids mingling base and virtual concepts, because the R' equation that mentions the HA heading doesn't mention the HB heading and vice-versa. If we apply this interpretation to the longer equation, we can ask "what value for R{HA} always makes R' true?", similarly for R{HB}, ie., if A' and B' are partial resulting base relations, we don't need to talk about both of A' and B' in the same breath, we can evaluate them separately. Further, we can ignore any question of base or virtual when evaluating them (again, because the A-algebra doesn't admit assignment).

Let me apply Heath to R{HR} and D, such that R{HR} = R{HR} <AND> D{HD}, where HD is the heading of D, eg.,

R' = R{HA} <AND> R{HR} <AND> <NOT> D

  • R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD}

Further, let me apply Heath to the right-most D{HD} term where HD = {S#, P#}, eg.,

D{HD} = D{HD} & D{S#}, so

R' = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD} <AND> D{S#})

Since the heading of R' is HR, if HR includes S#, we could also write

R'{S#} <AND> R'{HR} = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD} <AND> D{S#})
and removing some of the redundant projections, R'{S#} <AND> R'{HR} = R{HA} <AND> R <AND> D <AND> <NOT> (D <AND> D{S#})

Let "value of S# = 'Sx'" stand for the value 'Sx' in some triple that corresponds with a heading of {S# S#}. (In this case, 'S#' is not only a heading, but an attribute name.) Let me now ask whether whenever there is an 'Sx' in D{S#} must there not be an 'Sx' in R'{S#}? If 'Sx' is in D{S#}, it must be in D. If it is in R, it must also be in R{HA}. But it will not be in the right most term (the complement), so it will not be in the result and it will not be in R'{S#}.

So substituting 'S1' for 'Sx', this definition of MINUS doesn't have the S# value 'S1' in any of its resulting triples and if followed, the definition would require that we always remove the S# value 'S1' from the result. Similarly for the P# value 'P1'. So if the TD MINUS definition is followed, we must always remove tuples with such triples in the result. (I think we could say similar if 'S#' were not an attribute but any subset of HR.) With a language assignment, the analogy would be 'delete from both sides'.

With the Principle of Interchangeability, R'{S#} could be base or virtual but in either case, the analogy would remain, so the POI is not violated.

Now, I want to ask another question. Given the values for S and SP in Date's example, does this equation have two solutions? (ie., two solutions as far as the value 'S1' is concerned.) No, it doesn't, if 'S1' is in R'{S#}, it is in R'{HR} and vice versa. The analogy for a language assignment is that there is only one solution to the 'update' of S. From an algebraic standpoint, we don't need to consider the removal of S-tuples as having anything to do with the removal of SP-tuples, so why should a language that implements assignment introduce such a consideration? The language's dbms can handle 'updates' to S in isolation from 'updates' to SP.

As far as the value 'S1' is concerned, I think the Assignment Principle isn't violated because if 'v' doesn't include the 'S1' value, the resulting V doesn't include that value. But the WHERE clause makes no mention of a tuple (<S# S# 'S2'>,<P# P# 'P1'>) and if there were such a tuple in R{HD}, the P# value 'P1' would not be in R'{HR}, which would violate the Assignment Principle. This is why I think the TD MINUS definition must be inaccurate. Plus, from a practical point of view, it results in an SSP result that seems very mangled, given the original 'delete' statement. In Part Two, I'll try to explain a different definition.


Received on Tue Dec 16 2008 - 01:35:16 CET

Original text of this message