Re: Newbie question about db normalization theory: redundant keys OK?

From: David BL <davidbl_at_iinet.net.au>
Date: Sun, 23 Dec 2007 21:59:19 -0800 (PST)
Message-ID: <f649852f-46c4-43d6-8140-736158cc9083_at_e10g2000prf.googlegroups.com>


On Dec 24, 12:17 pm, paul c <toledobythe..._at_ooyah.ac> wrote:
> David BL wrote:
>
> ...
>
> > Operational Transform (OT) tries to do exactly what you're alluding
> > to. I have been investigating it for years now and can say that the
> > mathematics is complex and very interesting. The basic idea is to
> > think of changes to data structures as operations that can be
> > generated and executed by different processes independently (and
> > concurrently) on the same state without any locking, and the
> > operations can be transformed as required to achieve the same logical
> > effect when they are post-applied in different orders by the
> > processes. This holds the promise of allowing for replicated
> > databases which support local edits without being exposed to network
> > latency or outages, and for asynchronous merging of changes in a
> > similar fashion to CVS. Unfortunately, and not surprisingly there is
> > a limited extent to which OT can preserve the original logical
> > intention of the operations due to conflicts.
> > ...
>
> I didn't have replication in mind, but I will try to peruse whatever I
> can find about "OT". However, until simpler approaches have been
> studied more thoroughly, I'll stick to my suspicion that the recognized
> theorists all started out on the wrong foot.

The literature on OT focuses on the real time collaborative editing of text. IMO the published solutions tend to be incorrect or impractical. I'm not aware of anyone else researching the use of OT for databases.

Wikipedia provides a reasonable overview

    http://en.wikipedia.org/wiki/Operational_transformation

> My attitude about concurrency is likely much simpler in principle, if
> not in practice. Basically, an "ideal" rdbms would ignore concurrency!
> All a "transaction" can "know" is its data, aka the hand it's been
> dealt and this is sufficient to eliminate what some other transactions
> might be occupied with. If it regurgitates the queries the designer
> decides are pertinent to the transaction, at the "time" of update, along
> with the expected query results, and those match, then the rdbms could
> apply the transaction's expected updates, in perfect safety.

Isn't that just optimistic concurrency control? But what happens when the results don't match?

> If a concurrent mechanism in a particular dbms allows a transaction to
> behave according to data that is not "known" to its own app code,
> obviously no other "transaction" should be allowed to see such data and
> no update should be permitted in the first place. I vaguely recall that
> Oracle was one of the first to have such a mechanism but the
> implementation cost seemed excessive to me, ie., at least for
> high-volume oltp-type transactions, the apps could have coded the same
> things much cheaper, given a suitable "server".

I don't think I follow you. Are you saying that an application must not perform an update following a dirty read? The problem is not detecting it, but what to do about it. Depending on the application rejection is not always practical, even if you can live with application code that explicitly (or even implicitly) sits in a re-try loop.

I can only see two ways to avoid rejection:

  1. Locking - to serialise transactions.
  2. Compensation (eg using OT).

Both have their pros and cons.

Is the comment about Oracle referring to Multi-Version Concurrency Control (MVCC)? Although very useful, MVCC only addresses the important case of long running read only queries.

> (BTW, I think your deleted comment is unnecessary and irrelevant as far
> as the topic is concerned. Bob B has shown himself to be quite capable
> of understanding such physical issues as replication which are in fact
> much more elementary than his usual themes, which he is also quite
> capable of dealing of.

My comment didn't concern Bob's intelligence or knowledge (which are both impressive).

> Also, when it comes to synchronization, it is
> amazing to me how hard it seems to be for experts to see how simple and
> massive a problem it can be, eg., involving 2-phase commit on a large
> scale, but I suspect not for an layman outsider once the jargon has been
> explained.

I've never thought much of multiphase commit protocols. Instead I'm a fan of persistent message queues. I was actually intending to post to cdt to see if anyone could provide an example mandating a multiphase commit protocol (ie where no solution exists using persistent message queues).

> Another thing concurrency theorists don't seem to have
> considered much is the possibility of re-combining different relations
> from different db's to accomplish a limited number of desired effects,
> tactics as simple as storing details in one location and summaries in
> another - if the two don't balance then the collective relations are
> rejected, perhaps days later. But that would require them to work at
> the logical, not the physical level, which most of them would find
> impossible, counter to their training and other biases.)

I think storing asynchronous summaries in a different location is rather straightforward because it wouldn't be an independent source of updates. A message queue can be used to bring the summaries DB up to date. A persistent message sequence number (stored in each DB) tells you whether the summary DB is synchronised with the details DB, or could allow a thread to block until the summaries have been synchronised, preferably at the start of a transaction before it has locked any resources. Received on Mon Dec 24 2007 - 06:59:19 CET

Original text of this message