Re: how to build a database from scratch

From: paul c <toledobythesea_at_oohay.ac>
Date: Fri, 08 Dec 2006 13:41:01 GMT
Message-ID: <NXdeh.444867$1T2.95427_at_pd7urf2no>


Sampo Syreeni wrote:
> On 2006-12-07, paul c wrote:
>

>> Physical pages were fed to logical access tasks that performed queries 
>> and updates solely to memory. Logical tasks would wait for physical 
>> pages if they were not in cache.

>
>
> Sure, but this doesn't explain how the problem with parallel updates to
> a B-tree is solved. For example, if you lock the whole tree and the
> logical task which acquired the lock blocks waiting for physical I/O,
> suddenly all of the other tasks are excluded from the tree (not to
> mention anything that is transitively dependent on them because of page
> sharing and 2PL ordering), no new I/O goes in flight, and the system
> takes a nasty hit in exposed disk latency. Suddenly it's not at all
> "fast". If you try to solve the problem by asynch prefetching the blocks
> and only locking the pages when they're physically present, you'll have
> to guarantee that two processes fetching the same page will eventually
> synchronize on the same copy, which means that you're just moving the
> problem and not solving it, while at the same time you're possibly tying
> a lot of stuff like the disk cache layout with your chosen locking
> scheme. The design also cannot handle transactions with working sets
> larger than the physical memory, and it's quite possible that even after
> that problem is fixed, the system will not regain its former stability
> under memory pressure. Et cetera.
> ...

This is getting further and further away from the original question (at least the single answer "use a btree" to the original question) which I understood to be - does a BTREE algorithm need to change to allow concurrency. The answer is no if tasks or processes independent from the btree ones are used, eg., a logical lock manager (one that locks values, not blocks) and a transaction's updates are flushed together.

>> Query tasks obeyed a strict 2PL protocol and multiple table updates 
>> were ordered to prevent deadlock.

>
>
> 2PL does make life easy, but you can't use it unless you know precisely
> what data is going to be touched by a transaction when it begins. When
> you have this information available, it's nice and it should be
> exploitable, but in a typical environment which also allows e.g.
> operational ad hoc reporting, that data is not known beforehand.
> ...

Apart from user caution, the obvious answer to adhoc processing on a big scale is a second db image which doesn't need to double update overhead if it relies on physical logs and which might be only several minutes behind the first image in terms of its freshness.

>> There was a small minority of applications that weren't amenable to a 
>> simpler system. Simpler systems should be the goal, they are almost 
>> always stronger, safer and more economical.

>
>
> Yes, but when you have to handle a certain, relatively common OLTP
> workload, your architecture simply chokes and cannot be amended in an
> easy and principled manner. There is also nothing in your argument that
> isn't already considered by the existing literature: when you know your
> workload a priori, it's well known that specialized architecture can
> lead to simpler and higher performing systems, DBMS's built specifically
> for short write transactions with small working sets already utilize
> techniques similar to what you're advocating, special purposing as a
> general principle is accepted by many, and software architecture has by
> no means been neglected by DBMS designers of the past.
> ...

A decent designer will know the workload a priori.

> It's just that we were talking the problems that those same DBMS
> designers have bumped into, and considered worth solving. You on the
> other hand are not solving them, but simply assuming them away.
>

>> (BTW, I don't understand why some DBMS people are so interested in 
>> parallel processors, they are mainly the hardware manufacturers' 
>> reaction to the increasing gap between processor and memory speeds. 
>> Not many real applications need parallel hardware.)

>
>
> Technically parallelism is necessary at least for latency hiding
> (memory, network and disk), avoidance of diminishing marginal returns in
> high speed serial operation (both CPU and I/O), redundancy and
> availability on OTC hardware which is designed for price at the cost of
> failing often, sharp and fast, and algorithmic cost amortization over
> large numbers of small, independent transactions. Application-wise,
> you'll need it for scalability to high throughputs in broad scale OLTP,
> for geographically dispersed operation, and for multiuser sharing. All
> of these are real, pressing issues in the breadwinner DBMS apps, like
> financial services, reservation and planning, and O&M.
>
> You should also note that the interest in parallelizability is not
> limited to explicitly parallel hardware. In a sense you can equate the
> study of parallelism with studying various kinds and levels of
> dependency between decomposed computations and computational substrates.
> This is why your average DBMS running on a single processor will already
> benefit greatly from the insight that peripherals operate in an
> inherently parallel fashion, and that this built-in parallelism can be
> exploited for higher performance. The same goes e.g. for treating set
> oriented operations as parallel processes which can be optimized under a
> common parallel framework.

Fair enough, but I see no reason why a btree algorithm needs to recognize latency.

p Received on Fri Dec 08 2006 - 14:41:01 CET

Original text of this message