Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 23 Dec 2006 19:54:06 -0800
Message-ID: <1166932446.030108.319750_at_42g2000cwt.googlegroups.com>


Marshall wrote:
> On Dec 23, 5:08 pm, "David" <davi..._at_iinet.net.au> wrote:
> > Marshall wrote:
> >
> > > I'm not sure if this is a trivial rephrasing of what you just
> > > said or an actual disagreement, but no, I would not agree
> > > that they "shouldn't" be enforced on every update.
> > > However I would agree that there may be practical limitations
> > > on so doing, such as the amount of time necessary to check
> > > the constraint. Any constraint that can practically be checked
> > > on every update should be.
> >
> > You suggest that performance is the only issue at stake. In examples
> > like the above, a verification failure often points to an error in
> > *previously* committed changes. Software development is a good example
> > of a "non-monotonic" process. Sometimes you need to commit a change
> > that will temporarily break the integrity of the system.
> >
> > Now you could argue that the user should be forced to make all the
> > changes necessary for the DB to atomically change from one valid state
> > to the next. However in some domains that could lead to long running
> > transactions that take hours, days or even months to complete.
>
> How is that not exactly and precisely a performance argument?

Technically it is. However I expect that software developers would find the idea that edits are transactional at a high level (and may take hours or days) quite ridiculous. For example, they would wonder what happens if the power failed. I was arguing that in the mind of the user, software development is indeed non-monotonic.

> I will admit that I have seen examples of commits that break
> the system. For example, Perforce doesn't allow one to both
> move and modify a Java source file in a single transaction.
> Since Java source code specifies the package that it's in,
> and since the package must correspond to a directory, this
> is necessarily a compile-breaking checking. But that's a
> limitation of the tool, not of the process; if Perforce
> simply supported move-and-modify it becomes a non-issue.
>
> And in fact over time I've become more and more
> aware of just how much pain in the software world is
> cause by dbms implementations that don't realize they're
> dbmss, and hence fail to get right things that were solved
> problems decades ago. This is so prevalent that I've
> called it Spight's Law, which in essence says you have
> a dbms whether you realize it or not. (And if you don't
> realize it you have a crappy, broken one.)

Agreed.

> > > > I have the impression (please correct me if I'm wrong) that your
> > > > assumption that a DB should always be in a valid state is coloured by
> > > > the (relational) problems that you have expertise in.
> >
> > > Certainly this is always true for everyone. However I am having
> > > a hard time seeing the value of your approach given how much
> > > less it lets us count on the dbms.
> > Is it really a problem? A workflow can easily force a user to run the
> > verification as part of the process of using the DB within a system.
> >
> > As an example, good software companies have a release engineering
> > workflow that ensures that the release has passed various unit tests,
> > regression tests etc before it can be released. It goes without saying
> > that it must compile successfully.
>
> I suppose it could be considered a matter of perspective. If the
> system requires checkins to compile, then we can say we have
> a database of valid source code. If the system allows checkins
> which don't compile, then we ought to say we have a database
> of text files that may or may not be valid in some particular
> language. By and large I've seen that people are happy with
> the latter.

An edit by a user on a local working set is quite a different thing to a "check-in" to a repository.

I've been assuming you don't like the idea of a repository that supports branching and merging. I also assume you believe there is a requirement to lock all objects before they are *edited*, in the manner of strict 2PL. I was actually going to pursue the question of what happens when two users deadlock because they edit objects in different orders. This could be unfortunate if they've taken hours to perform their high level transactional edits in order to maintain strong integrity constraints.

> In my experience, constraints hold if and only if they are
> centrally enforced. Those constraints that everyone knows
> of and which are supposed to be enforced in application
> code are a distant memory, victims of broken windows
> syndrome.

Your statement is too strong. Eg I have yet to see a software release that doesn't compile - even by companies with poor release engineering.

> > > I'm also unclear on how much
> > > I have to give up in the way of integrity enforcement. I'm having
> > > a hard time building a mental model for that. Your intent only to
> > > speak at a high level somewhat exacerbates this difficulty.
> >
> > > Hmm, I just had an interesting idea. Perhaps the issues your
> > > idea raises could be dealt with as a "quality of service" issue.
> > > Where one needs strict durability, one could so specify externally
> > > to the application.
> >
> > > This is a bit tricky because of the question of guarantees of
> > > desirable properties. One area I'm interested in is
> > > static analysis, and that's entirely dependent on building
> > > theorems from known properties of a system. Weakening
> > > those properties might render some analysis techniques
> > > unsound.
> >
> > Examples of that would be relevant to this discussion.
>
> How ironic that now you are asking me to be specific.

It is ironic.

The scientific method is characterised by making a hypothesis that is powerful in that it is predictive and easy to falsify. Typically the claim is difficult (or even impossible) to prove true.

My proposal is like that. I make a bold (perhaps you would say foolhardy) claim that all mutative work can be fine grained and avoid CPU intensive work. This is powerful and useful if it is true. I can think of lots of examples where the claim holds. I can't think of a realistic one (even for an RDB) where it is false - but I lack expertise in RM. Only Sampo has made an effort in this regard.

My claim has been met with some hostility and derision. I have been told that the onus is on me to prove my claims. This exposes ignorance of the scientific method.

The only decent counter argument was provided by Bob with his link to the paper "The dangers of Replication and a Solution" by J. Gray et al.

   As in turns out, on page 179 there is some discussion on convergence when transactions commute "so that the database ends up in the same state no matter what transaction execution order is chosen". This points to assumptions made by the paper than are avoided using OT.

> Hrm. Well, my understanding of the limits of your proposal
> remains sketchy. However if I understand you if transaction
> A and transaction B each issue an update in an incompatible
> way, the system picks one and discards the other, and the
> code that issued the update is not notified of this in a timely
> manner.

This is basically right. However nothing is notified. Conflicts are *always* dealt with automatically and silently.

The trick is to make integrity constrains weak so large edits don't need to be discarded. Furthermore by testing the strong integrity constraints with read only queries the users get the opportunity to see the conflicts for what they are (in all their glory). Correction is a manual process without any need to throw away previous work.

> Is that correct? Can we characterize precisely what
> kinds of constraints are still possible, and which are lost?

The mathematics of OT is still being worked out. AFAIK no one has looked into what it means for RM.

> I didn't see that that was clarified; I apologize if I missed it.
> (I recall the phrase "complex constraints" but I don't recall
> a specific definition of what that was.)

It would be a constraint that implies significant amounts of edits may have to be discarded. You can't even constrain the maximum size of a text document!

An assignable field will tend to throw away user edits. That's only permissible for simple fields. I wouldn't normally assume assignment operations on string fields. It is better to support character insertions and deletions on strings.

> Suppose we have a constraint that a graph will not
> contain loops, and software that traverses the loop,
> and an update that temporarily allows a loop. Static
> analysis will tell us that the graph traversal logic doesn't
> need to check for duplicate nodes, because there are
> no loops, so this code is omitted. However if it
> encounters a loop, the program will diverge where
> we thought we could count on it to converge.

I need more information to comment. What is the application?

> My strong feeling is that software needs to move towards
> *more* correctness, not less, and that the way to get there
> is runtime constraints and static analysis. I don't see that
> anything else has a chance of working. (Unit test sure ain't
> gonna get us there.) And I can't help but feel that any
> software engineer who gives up correctness for some
> performance has sold his birthright for a mess of pottage.

I agree, but I think my approach will give stronger guarantees not less!

If the DB tries to enforce strong integrity constraints then you quickly have to make compromises. There are serious performance issues. Users will lock each other out. Editors will be complex and difficult to develop. This self-limits the strength of the integrity constraints.

Cheers,
David Received on Sun Dec 24 2006 - 04:54:06 CET

Original text of this message