Re: Kernels, Contexts, Threads, and Extensible Database Architecture

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Sat, 21 Apr 2012 06:44:28 +0000 (UTC)
Message-ID: <pan.2012.04.21.06.44.28_at_gmail.com>



On Fri, 20 Apr 2012 18:03:31 -0700, knorth wrote:

> Dr. Dobb's Journal article discusses Oracle Database architecture:
>
> Kernels, Contexts, Threads, and Extensible Database Architecture
> http://www.drdobbs.com/database/232900522

It's an interesting article. I will refrain from nitpicking on the context switch and the processor modes, after all, the first great strides in my career were made on the CPU that had 4 regular modes and "interrupt stack". The modes were user, supervisor, exec and kernel and CPU was NVAX. Context switches and how to avoid them was a kind of science back then.
Your article contains an interesting summary of some points in the Andrew Tannenbaum's "Modern Operating Systems", a classic textbook that is still used on various universities.
That doesn't have too much to do with the databases per se, but is an interesting introduction and there is a very strong historic link. As far as the databases go, your article is certainly interesting, with one missed point: open source databases, and that is what we're talking about here, are nowhere near the commercial ones in terms of reliability, sheer processing efficiency and speed, capacity and versatility. NewSQL has yet to withstand the test of time and the prospects do not look god. SQL is modeled after the mathematical theory called "naive set theory" (except for the abomination called "ANSI Join Syntax", which is an attempt to introduce brain damage into SQL) and is also well understood by the accountants, which is vitally important. Two or three years ago there was a movement claiming that NoSQL will take over the world in similar fashion as Pinky and the Brain. A brief summary of that fad can be found on Youtube under the title "MongoDB is Web Scale". There is no reason to conclude that the fate of this NewSQL will be any different: it will probably carve a niche among Python enthusiasts, Ruby enthusiasts and scripting kiddies.
Relational databases owe their overwhelming popularity to the rock solid logical foundation of the SQL language as such. Contrary to popular belief, set theory, with all of its rigor, including joins, projections and set operations, is very well suited to the real world and is perfect for operating on tables. SQL is still the king of the hill and while that is so, there is very little chance that something will replace the traditional RDBMS.

To make things worse for the newcomers, RDBMS is not only a SQL interpreter, it's also a resource manager. Resource managers were used on the computers like IBM 360 to avoid at that time extremely expensive context switches, which connects us to the beginning of your article. That was the ruling architecture on IBM mainframes, for several decades. Unrelated to that, IBM also used to have VMware of sort, called VM/CMS, long before VMWare has conquered the world. Transaction managers, like IMS and CICS, were used to manage transactions, another business requirement, mostly modeled after banking transactions. Some of the biggest and earliest customers were banks, which explains why companies went out of their ways to adjust their software to the banking business. Transaction managers are very complex and hard to do in a scalable way. Rules of the game, also motivated and supported by the banking industry, known as ACID rules, do not help. Things like repeatable reads, transaction isolation, atomicity, consistency, isolation and durability tend to have devastating effect on performance, especially with new databases.

No Lego toys architecture can help there. Plugins are nice, as long as the database is ACID compliant and can handle the volume. Also, the old RDBMS-es are very well instrumented. That means that tuning the applications is a very precise procedure, with numerous tools that can help people to figure out where the time is spent and to help performance. Various profilers, tracing tools, wait event interfaces and alike are available. Typically, open source databases have very few such tools and if they have morons on the steering committee, as in the following link, the obscurity is all but guaranteed:

http://tinyurl.com/68gu822 (The author is very influential in the Postgres community. That is probably one of the main reasons for the small size of that community)

Producing a rock solid and usable RDBMS is a very expensive and labor intensive undertaking and open source projects are simply not up to the task. Partitioning, row level locking, hot backup, in-line upgrade, usable procedural extensions, clustering and high availability are very expensive to develop. The next Richter scale 10 earthquake in the Oracle world will not be caused by an open source database or the advent of NewSQL or NoSQL, it will be caused by the old nemesis:

http://tinyurl.com/c7vmuqq

Look at those prices! If DB2v10 turns out as good as rumor has it, and if prices remain this low, there will be trouble in the paradise. A big trouble. There is also a whole slew of DB2 books on the Amazon and DB2 is now free for personal use. Looks like IBM is finally getting serious about DB2 on Linux. IBM is the company that can re-kindle competition in the RDBMS arena, not Percona, EnterpriseDB or 2ndQuadrant. NewSQL, NoSQL or some other "simplified" version of SQL will not break the lock that SQL vendors have on the market. It's just a fad for kids like this:

http://www.youtube.com/watch?v=oL-A4JYwgH4

It's a little league of the IT community and it will always remain a little league. I would compare NewSQL and plugin oriented databases to Kim Kardashian: pretty, present in the media but not really important.

-- 
http://mgogala.byethost5.com
Received on Sat Apr 21 2012 - 01:44:28 CDT

Original text of this message