Re: Concurrency in an RDB

From: David <davidbl_at_iinet.net.au>
Date: 20 Dec 2006 19:57:00 -0800
Message-ID: <1166673420.441351.294830_at_i12g2000cwa.googlegroups.com>


Marshall wrote:
> On Dec 20, 5:14 pm, "David" <davi..._at_iinet.net.au> wrote:
> > Marshall wrote:
> >
> > > You do know this *is* a theory group, right? Do you actually
> > > have any theory? Any papers? Any math? Computational
> > > models? Equations? Examples, even?
> >
> > Yes, I have all that, including formal proofs of correctness. I have
> > no intention of describing the detail in this NG even though I expect
> > it could interest you. In any case it is irrelevant to the original
> > post.
>
> Okay.
>
> However I should tell you that I have a formal proof that your
> approach is invalid. I have shown it to several university
> professors, (I can't say who) and they all agree it is sound.
> However I am unable to reveal the proof, for reasons that
> I prefer not to specify.
>
> See what I did there?

Lie? :)

I'm sympathetic to your objection.

> > > > It is certainly possible for the system to impose your constraint, but
> > > > in that case you will find that it arbitrarily throws away records so
> > > > that only one remains. Both sites will agree on the record that was
> > > > kept. The symmetry is broken using a total ordering on site
> > > > identifiers.
> >
> > > By "a total ordering on site identifiers" I take it you mean
> > > that every peer gets an id, and if updates from id1 and
> > > id2 conflict, then id1 always wins. Terrific, as long as
> > > I can have id1, ha ha.
>
> > You trivially misrepresent.
>
> You complain that I misrepresent but you don't correct the
> supposed misrepresentation.

Yes, I did - in the sentence that followed "You trivially misrepresent".

> If I didn't read your mind correctly
> in expanding the phrase "total ordering on site identifiers" then
> what did you mean? But I recall that you have "no intention
> of describing the detail[s]."

You worked that out correctly. The use of site identifiers to automatically resolve conflicts without a central server is described in the literature.

> > I made it quite clear that integrity
> > constraints that force user edits to be discarded after merging should
> > be avoided.
>
> I see: your approach is to limit what constraints are allowed.
> Do you see why this might not be appealing for a general
> purpose data management solution?

Yes. This is why I'm interested in people's opinions. The proposal is for complex validation (which may be CPU intensive) to be *calculated* in a shared read mode, independently of fine grained mutative transactions. The question of whether the data is in a valid state becomes a property of the data. Users will run the data validation queries to help them massage the data (perhaps interactively, or perhaps with explicit branching and merging) towards a valid state, without the need to lock each other out or lose edits when they merge.

A realistic example is a DB of source code that allows for edits to persist even though the source code doesn't compile successfully. The DB can represent a work in progress as well as a stable release. The strong integrity constraint of successful compilation is not enforced at all times by the DBMS.

The conventional approach would actually use a GUI that uses dialogs to allow a user to edit source code (in multiple files at once) in a transactional manner such that the code atomically changes from one compilable state to the next. I don't find this appealing.

Paradoxically, I think my approach is actually *better* suited to data with very complex integrity constraints!

> > > Okay, I think you're in the wrong newsgroup. This is a database
> > > theory newsgroup. Those with an application framework du jour
> > > are directed to comp.object. OT is OT here.
> >
> > The approach I have described addresses the goals of a DBMS, such as
> > atomicity, integrity and multi-user support. It seems relevant to this
> > NG.
>
> Addresses them ... how exactly? You address atomicity and integrity by
> forbidding or severely restricting them, basically. If I understand you
> correctly (entirely possible I don't, since you're so spartan on
> specifics)
> then client code can't even count on successful updates persisting.

There needs to be a proper DB with atomicity etc to protect the data w.r.t the weak integrity constraints.

It's true that durability is not so important with my proposal because there is no need for multiphase commit. Nevertheless data-loss is not desirable.

> > I think your problem is that I haven't provided the mathematical
> > detail to justify my claims. I agree that my descriptions are more
> > about the repercussions of OT for a DBMS rather than revealing how it
> > works in sufficient detail to be persuasive.
>
> Well, a number of us have said that those repercussions are severe
> enough to rule out this approach for general purpose use.
>
> I'm unclear as to your goals for this thread.

I want feedback on the general approach. Particularly examples where the approach is inappropriate.

> You've made it
> clear that you don't want to get into details; okay. If you wanted
> to provide us with a high-level description of some ideas you have,
> you've done that.

I don't see how the precise details of OT will help with the discussions in this thread. There are more than a dozen papers on OT (even if flawed in various ways) that provide ample background for the topics I wanted to discuss. I see little point in discussing such a complex topic with people who haven't studied it in depth.

> If it was to get us to agree that the limitations
> you're proposing aren't too bad, we've declined to do that.

I don't care if you agree. In fact I would prefer a devil's advocate that picks holes in my approach.

> What else would you like to discuss?

Nothing more.

Cheers,
David Received on Thu Dec 21 2006 - 04:57:00 CET

Original text of this message