Re: Coul constraint *enforcement be done at compile time vs runtime ?

From: Brian Selzer <>
Date: Sat, 20 Jun 2009 15:30:12 -0400
Message-ID: <9Na%l.208$>

"Cimode" <> wrote in message
> On 16 juin, 14:47, "Brian Selzer" <> wrote:
> > "Cimode" <> wrote in message
> >
> >
> >
> > > As the subject says...
> >
> > > The traditional direct image system approach consists of having
> > > relations constraints implemented at the time when the relation is
> > > updated meaning during UPDATE/DELETE/INSERT operations. As a part of
> > > a research effort to understand better what may be a relational system
> > > implementation, I have discovered that it is actually possible to
> > > perform constraint enforcement at constraint declaration time.
> >
> > Can you be more specific? Are you saying that /all/ constraints can be
> > enforced at compile time? When a user submits an illogical update, for
> > example, a delete that targets a tuple that is referenced in an
> > inclusion
> > dependency, what do you call the process of determining that it is an
> > illogical update, if not constraint enforcement? Moreover, while domain
> > definitions partition the set of all particulars in the Universe into a
> > set
> > of domains and while state constraints shape the Universe into a set of
> > possible worlds, transition constraints specify which possible worlds
> > are
> > directly accessible from the distinguished actual world, which is
> > represented by the current database value (the redundancy in the terms,
> > 'the' and 'current'.and in the terms 'database' and 'value' being
> > intentional for emphasis). As a consequence, since the enforcement of
> > transition constraints depend on the current database value, how can one
> > determine at constraint declaration time whether a particular update
> > violates a transition constraint?
> I realize I have not been explicit enough by what I mean by
> *enforcement*. I apologize but it is difficult for me to summarize a
> decade effort of design into a single thread.


> To answer your question, let me use a significant example that will
> underline the difference in approach between the way a direct image
> system handles an INSERT constraint check and the way a system
> implementing independence between the logical and physical layer would
> work. I won't get into too much detail since that would require very
> lengthy explanations onto how the relation representation scheme works
> but I will attempt to provide some explanations to clarify my point.

OK, but my questions apply to all implementations, not just direct image implementations.

> In a typical direct image system such as ORACLE, DB2 or SQL Server,
> the typical INSERT constraint checking is handled the following
> fashion (there may be some variations but the general algorhythmics
> remains the same):


> Considering the 3 following relations SCHOOL, STUDENT where one SCHOOL
> may have 0 or more STUDENTS. Let's suppose I want to INSERT 3 new
> STUDENTS while checking that they do belong to a SCHOOL that belongs
> to SCHOOL body. The traditional direct image system works in the
> following fashion

> --> A transaction context is created for the INSERT when the
> instruction is submitted to the engine.


> --> The *to be inserted* new STUDENT tuples in are matched against
> SCHOOL to enforce constraint checking.

True, but wouldn't the constraint check have to be done regardless of the implementation.

> The operation involves an *in
> between* (before COMMIT) issue/execution of SELECT statements that
> check the physical existence of matching SCHOOLS in SCHOOL relation.
> --> If SOME or ALL tuples are matched, a COMMIT OK flag is issued to
> allow the transaction to continue. Oppositely, when SOME or ALL of
> the tuples are NOT matched in SCHOOL, a COMMIT KO is issued to
> indicate to the system that there would be a constraint violation if
> the current operation would actually be completed. The COMMIT KO, an
> internal data sublanguage instruction, then usually ends up with a
> ROLLBACK external language instruction.

And this is exactly what an inclusion dependency requires. Is the SCHOOL that is referenced in the insert already defined? This is a logical requirement: I don't see how the type of implementation can alter the need for this question to be answered whenever an insert of a student of a school is submitted.

> By the above mechanism, a traditional direct image system, the system
> analyzes the *intent* at updating the body of relation and expresses
> internally whether or not that intent would actually result in a
> constraint violation. Among the many drawbacks that come from such
> approach:
> --> The necessary *analysis of intent* requires multiple SELECT
> statements to be issued. These SELECT statement will occur everytime
> a set of tuples are to update on a specific relation and are in fact
> orthogonal overhead to the INSERT transaction context. For instance,
> that even if one issues an INSERT statement for a similar value to be
> inserted, SELECT statements on that value will occur as many times as
> the INSERT statement is issued.
> --> By binding the analysis of intent solely to the transactional
> context of a single INSERT transaction, the system will actually lock
> any other transactions from updating the content of both the STUDENT
> and the SCHOOL transaction. STUDENT will be locked for allowing the
> INSERT and SCHOOL will be locked to allow the SELECT statement (there
> are wayss around that but they require additional ressources to be
> consumed; I am thinking about ORACLE Read consistency and SQL Server
> SNAPSHOT mode). All this comes at a great cost as far as concurrency
> is concerned and resoource consumption.

Yes, locking is a necessary evil whenever an update is submitted. Even with SNAPSHOT isolation, locks are applied when performing updates. There are only three alternatives to locking that I can think of:

(1) poll each connection to see if it has an update to submit; (2) pass a token amongst the connections and agree that only the connection with the token can submit an update;
(3) allow each connection to just stuff information in, but devise a mechanism for detecting collisions and reverting the database to the state it was in at the start of the first of the colliding updates.

All of these alternatives have their ups and downs. Even by implementing some kind of priority queuing mechanism, alternatives (1) and (2) would incur overhead for connections that don't need anthing done. Alternative (3) may appear useful, but it introduces other problems that are beyond the scope of this post.

SNAPSHOT isolation limits the need for locking to just those transactions that submit updates, and that is in my opinion the least intrusive of the concurrency control mechanisms. I don't see how any system, whether direct image or not, would benefit more by implementing any of the alternatives.

> There's in fact a fundamental logical problem hidding behind the above
> physical problem. The non distinguishability between the two
> contexts, the context of the operation and the context of analysis of
> intent is a direct consequence onto how the internal relation is to be
> operated. In enforcing constraints and performing a UNION, direct
> image systems *internally* assume the following elementary relational
> premises:


> --> Premice 1: An user INSERT/UNION user operation between STUDENTS
> and NEW_STUDENTS is in fact 2 relational internal operations PROJECTION
> (SELECT for analysis of intent)-UNION(COMMIT INSERT). That is a
> direct consequence of the fact that when operated the two relations
> are not of the same type so the analysis of intent imposes a necessary
> *conversion* process to allow the external UNION to be performed.
> --> Premice2: The UNION operation occurs according to the following
> algorhthmics:

> <--> NEW_STUDENTS = EMPTY no matter what!!!

> Premice 1 is a consequence of the fact that NEW_STUDENTS internal
> representation has no mechanism that would represent it as STUDENT to
> allow the analysis of intent to be implicit. Premice 2 OTOH is a
> direct consequence of flawed relational theory.
> In fact, a more reasonnable approach is to state that

I don't understand what you're driving at here. The projection and join (which for 3 rows would probably translate into a three index look-ups, one for each SCHOOL value) required to enforce the constraint, do not alter operands of the join, just the result, which should be empty. STUDENTS and NEW_STUDENTS are of the same type, and therefore should remain UNION compatible regardless of the evaluation of relational expressions required to enforce constraints.

> and STUDENTS2 are relations sharing the same header. As a
> consequence, STUDENTS2 in fact equates to STUDENTS *if and only if*
> NEW_STUDENTS is empty OR if NEW_STUDENTS would provoque a constraint
> violation. This is in fact a fundamental difference from traditional
> relational approach with important consequences:


> --> The argument that indicate that a relational operation output such
> as a UNION *always* ends up with one of the relations involved in the
> operation is a fallacy. That fact is a direct consequence of
> underestimating the relation *naming* process to the profit of headers
> and bodies.

I agree that a UNION does not necessarily end up with one of the relations involved in the UNION, but can you be a little more clear here? What is this *naming* process and how can underestimating it profit headers and bodies?

> --> A truly relational system can *only* perform operations between
> relations of the same type. A system that does not guarantee
> representations can not be perform relational operations neither it
> can do analysis of intent efficiently.

I don't understand what you mean here by operations. Can't a join can occur between relations with even completely different headings?

> --> A UNION operation between two relation always produces a third
> relation.


> From the above premices, I can present how a non direct image system
> actually performs the UNION constraint checking :

I can't really comment further until I have some of the above questions answered.


> --> To operate relations, a relational operation engine handler must
> have an *isotyping* mechanism (same type) that would necessarily
> represent the relation variables as being of the same type. The
> isotyping mechanism is directly bound to the operation.
> --> The isotyping mechanism would allow to internally solve

> STUDENTS UNION NEW_STUDENTS = STUDENTS2 and consider a constraint
> violation *if and only if* STUDENTS2 = STUDENTS. I can not present
> how my engine does it in detail, because it would impose an entire
> book to write, but what I can say is that the relation representation
> scheme used allows a relation level projection *not* a tuple level.
> The representation scheme allows for instance to equate two un-ary
> relations without resorting to elementary tuple based projections.
> --> Since the isotyping mechanism is bound to the operation, and
> inherently allows the analysis of intent, each time a relational
> operation is compiled, the analysis of intent is performed, leaving
> only the UNION operation to be performed at run time

> What I can say is that this engine is currently working (still fixing
> issues though especially with virtual relation types) but it performs
> basic relational operations according to the above principles. The
> advantages are numerous:

> --> A system that operates relations not values
> --> A more economic system since no overhead is done in constraint
> checking. Only the demanded operation is actually executed.
> --> The system implements the same mechanism between any pair of N-ary
> relations. Since these relations can share the same header it greatly
> simplifies for instance dupplicate checking.
> --> I am still working on the system language compiler and fixing some
> issues and hope to have a prototype in the next 2 years...

> I hope this clarified...
Received on Sat Jun 20 2009 - 21:30:12 CEST

Original text of this message