Re: ACID et al

From: paul c <toledobythesea_at_oohay.ac>
Date: Mon, 05 Dec 2005 21:11:43 GMT
Message-ID: <j22lf.47846$ki.22240_at_pd7tw2no>


vc wrote:
>
> ...
>
> Ah, ok, so if there is no *concurrent* access to a data item, then
> there is no need to worry about locking and such.
>

Yes. If I may delve further into 'Consistency', in the locking implementation that I was most familiar with (and that was intimately, ranging from internals to customers) we used the strict 2-phase protocol (locking as opposed to commit although we used 2PC for connected db's) that Gray popularized in the 1980's where you have the obvious kinds of locks, share, exclusive and modifications such as 'intent to share', 'intent to share exclusive' and so forth, all kept in a conceptual hierarchy which on the surface seemed attractive when a modifiable 'dictionary' or 'catalog' was involved. What the term '2-phase' meant was that no lock could ever be acquired in a transaction after any lock had been released. This in fact was our definition of a transaction. The number of locks would grow throughout a transaction and when there was nothing left to do, all locks would be released 'simultaneously' which meant the transaction was over, since in our scheme while you might have lots of database reading going on, if there were no locks there were no transactions as such.

(Of course, there were other kinds of locks outside the purview of the lock manager component, for example ones that kept the logging and checkpoint functions from interfering with page images that multiple query tasks were looking at and vice versa but I don't think of those as database locks rather they are locks that a particular choice, eg. multi-tasking, of implementation imposes on itself - they're like operating system locks. So I'm talking about pure logical or predicate locks.)

Even though I might be using different words from what some people are used to, the notion is familiar to lots of SQL programmers - I think it is called 'isolation level' or somesuch.

Here's my point. It always seemed to me that a fairly large bit of psychological mumbo-jumbo was involved in the strict 2PL protocol. This was the price one paid for having a db engine with built-in 'completely safe' support. What I mean by 'mumbo-jumbo' is this: The general-purpose lock manager had an underlying assumption that application tables were designed for particular apps. In fact, I think all general-purpose lock managers must make the same assumption. But we know that assumption isn't true in practice. For one thing, different apps share the same tables. For another, lots of times, people put the wrong attributes in some of their table definitions.

Here's a case: Bob goes to the ATM and retrieves his account balance before deciding how much money to withdraw. Assuming it isn't one of those disconnected ATM's and assuming that some kind of escrow lock isn't being applied, Bob is probably assuming at the moment he is looking at the ATM's screen that there is no way his wife Sally across town can prevent him from cleaning out the account. If the central bank computer is truly involved (I gather they aren't always), it only needs to know a few things before it authorizes the dispensing of the cash - account number, the fact that Bob put in the right password, the account balance and whether the ATM has enough cash on hand.

However, in the central computer, plenty of extraneous things may be locked while Bob puts on his reading glasses so he can read the latest greatest improved ATM interface screen, for all we know, some branch data may be locked too. Contrast this with the same transaction done by a teller in a branch who is looking at Bob's driver's license and at a PC screen that has a lot more info in it. If Bob's bank is as nosy as mine, it probably shows the teller his dog Fido's name. Now the teller decides to accede to Bob's withdrawal and presses a 'submit' button. Based on what? In most apps that I've seen, it was never made clear to the developers what the conditions for an update to be eligible or legal, were. So if the development shop was highly structured, the developer would simply be told what tables they were to use and those were the ones that the transaction would lock. When the shop wasn't like this, as often as not, individual developers would decide what tables or rows would get locked. They would do this indirectly by deciding what data they would show the user. In effect, Fido's name became part of the business rules for a withdrawal transaction. It might indeed be a business rule that the teller must confirm Fido's name before authorizing the funds. But I'd say if that's so, the application should know about it.

We used to lock rows based on the values of 'primary keys'. We were about to ship a version that gave much finer granularity, basically letting a wide range of WHERE clause conditions be the lock predicates, which Gray had also written about. This didn't ship because the main developer went mad (I'm not kidding, literally barking mad) and I lost sway over one of the implementations (the mainframe one - the people who took over decided that since they'd heard lock managers were resource pigs and didn't believe that ours wasn't decided to put MVS SMF monitoring *inside* the lock manager, which certainly fulfilled its own prophecy - that was about the time this particular product started to lose its way.)

But even had we shipped, we would have been just playing the game, being pawns in the application-system mismatch and the law of diminishing returns. What I really wanted, all along, was, to use RM terminology, to attach dynamic constraints to every update. And it is the 'client' that has more potential to know exactly what those constraints should be, eg. the account balance or the branch limit as opposed to Fido's name. It can know this if the app tells it, or it can infer that what 'fields' are displayed (as opposed to what rows were queried) should determine what values for those fields should be 'asserted' in a dynamic update constraint or it could make some other installation-defined choice, such as the really blunt one of sending all the rows that were queried back to the server along with updates and expecting the server would reject the updates if any of the rows had had changed while the teller scratched her nose. Lots of variations are possible, such as time limits and so on. For me, the crucial one to avoid is ever letting the app programmer invent some generally unsafe protocol, such as releasing a lock before obtaining another one and allowing it to propagate through to other apps that may not be so safe if that protocol is used. An application model that doesn't allow locks to persist across display interface activities, seems the easiest way to guarantee this along with some default redundant 'dynamic constraint' checking for update messages as well as a way for dba's or da's to default apps with high-performance / high-volume expectations for more streamlined constraint checking, such as 'account_balance > debit_amount'.

Why the heck should we make a server 'bigger' so that it can remember everything every one of thousands of users did ten minutes ago? One might ask, if the client is given such leeway, how does the server know to trust the client, ie. the client's code. I believe the www people have fair solutions to this problem for many environments (eg. code can't originate locally as far as the client is concerned) in the most common environments. Stricter environments have other solutions ranging from real locks and keys to thin clients.

These are just my attempts at explaining why I'm leery of lock managers!

thanks,
p Received on Mon Dec 05 2005 - 22:11:43 CET

Original text of this message