Re: The Practical Benefits of the Relational Model

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Thu, 17 Oct 2002 05:38:05 GMT
Message-ID: <3DAE4CAC.90204_at_earthlink.net>


Lauri Pietarinen wrote:

>>Out of interest, how does D4 insert into this view?
>>
>> (A UNION B) MINUS (A INTERSECT B)
>>
>>It seems to me to be entirely arbitrary to insert into A rather than B, or
>>vis-versa, and only slightly less arbitrary to not allow any inserts at
>>all.

I managed to confuse someone with a passing reference to Venn Diagrams once upon a couple of weeks ago - I intended to get around to explaining myself (since I *did* do a very bad job the first time).

If I read Date's article/chapter on 'Updating Union, Intersection and Difference Views' correctly, then where a value is inserted depends on the table predicates of both the view as a whole and the underlying tables. In the example under discussion, the view V has predicate PV and the tables A and B have predicates PA and PB respectively.

In the general case, if you have a view U = Y UNION Z, with predicate PU = PY OR PZ, and you attempt to insert a row R into V, then R must satisfy PU, and if R satisfies PY, then R is inserted into Y, and if R satisfies PZ, then R is inserted into Z (so R might be inserted into either table or both tables, and the order of insertion is undefined).   Similarly, if you have M = Y MINUS Z, then R must satisfy PM = PY AND NOT PZ, and R is inserted into Y. And, if you have I = Y INTERSECT Z, then R must satisfy PI = PY AND PZ, and R is inserted into both Y and Z. In the case of both views I and U, no error is raised if inserting into X also places the value in Y or vice versa (which can only happen (I think) if X or Y is a view based on the other, or if X and Y are both views based on some other common table (or view)).

With view V = (A UNION B) MINUS (A INTERSECT B), we can picture the view using a Venn diagram such as the one that follows (I hope you're using a constant width font to view this).

  +-------------------+
  |                   |
  | +--------+        |
  | |AAAAAAAA|        |
  | |AAAAA+--+------+ |
  | |AAAAA+ *|BBBBBB+ |
  | |AAAAA+--+------+ |
  | |AAAAAAAA|        |
  | +--------+        |
  +-------------------+


If you can imagine that the table A is represented by the larger of the inner boxes, and the table B is represented by the smaller, then (A INTERSECT B) is marked with the star, and (A UNION B) is marked with A, B, or star, and the view V is marked with either A or B but not the star. Therefore, the contents of V is only those rows which are either in A but not in B or in B but not in A. This is the Venn diagram I had in mind - elementary set theory and all that. [Paul - apologies for expecting you to guess that this is what I meant.]

Now, for the DBMS to be able to determine where to insert data in the view, it has to know the table predicates for A and B. Further, for a row to be insertable via V, it has to satisfy ((PA AND NOT PB) OR (PB AND NOT PA)). In the examples produced earlier, both A and B simply accepted any value in the finite subset of integers allowed by the DBMS, so the meaning of A and B had complete overlap as far as the DBMS was concerned (see the article Lauri referenced below). Therefore (ignoring nulls - and possibly even if they exist), there is no value that you can present that satisfies PV because any value that satisfies PA also satisfies PB and hence (PA AND NOT PB) is equivalent to (PA AND NOT PA) and no value satisfies that condition. Hence, there is no way to insert any data into the view. So to answer the 'double indented' question: if the tables A and B are not constrained other than having columns defined on the same domains, then no values can be inserted via the view. So, under the condition PA = PB, V is a non-updatable view.

If, on the other hand, PA(x) was (x mod 2 = 0) and PB(y) was (y mod 3 = 0) - so A contains multiples of 2 and B contains multiples of 3 - then values such as 1 and 5 cannot be inserted because they do not satisfy PV (they satisfy neither PA nor PB), values such as 0 and 6 cannot be inserted because they do not satisfy PV (because they satisfy both PA and PB), values such as 2 and 4 can be inserted into V and are actually inserted into A because they satisfy PA and not PB, and values such a 3 and 9 can be inserted into V and are actually inserted into B because they satisfy PB and not PA. And, under the condition that PA != PB, V is an updatable view.

Now, in the unconstrained versions of A and B, any (finite integer) value (in range) can be inserted into A or B or both without violating the table constraint. But that leaves me puzzled about the view constraint. It seems to mean that you can end up with values that legitimately appear in V but that cannot be inserted into V, doesn't it? For example, if A contains { 1, 2, 3, 4 } and B contains { 3, 4, 5, 6 }, then V contains { 1, 2, 5, 6 } but none of those values could be inserted into V.

Should this be allowed, or should the DBMS deduce that because the view is based on A and B, then any row that is inserted into the base table should also be insertible by views based on the base tables? This leads to problems, because you couldn't create mutually exclusive views based on the tables - whereas that is desirable for all sorts of reasons such as database partitioning or security (preventing user A from looking at data belonging to user B and vice versa).

So, I'm in a quandary - what am I missing?

> The latest article in www.dbdebunk.com (By Date and McGovern)
> might shed light to that problem.
> 
> They state that it is a bad idea to have tables whose
> meanings overlap and clearly that is the case with A and B.
> 
> On the other hand C says that their meaning does not
> overlap so hence the contradiction and the refusal of
> D4 to insert anything anywhere.

[I'm not sure what C is referring to in this paragraph]

According to my understanding of the theory, I agree that D4 should reject any attempt to insert any value into view V assuming that there is no constraint known to the DBMS (D4) that distinguishes between the values that can be inserted into A and the values that can be inserted into B.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix 1.00.PC2 -- http://dbi.perl.org/
Received on Thu Oct 17 2002 - 07:38:05 CEST

Original text of this message