Re: Relational Model and Search Engines?

From: Nick Landsberg <hukolau_at_NOSPAM.att.net>
Date: Mon, 03 May 2004 16:32:53 GMT
Message-ID: <Vuulc.14869$Ut1.446229_at_bgtnsc05-news.ops.worldnet.att.net>


Anne & Lynn Wheeler wrote:

> Leandro GuimarĂ£es Faria Corsetti Dutra <leandro_at_dutra.fastmail.fm> writes:
>

>>	Main memory is just a hype word for big cache, little
>>reliability.  It is just an implementation trick that's orthogonal to
>>the database model.

>
>
> I've seen some numbers for main memory databases being up 10-100 times
> faster than fully cached RDBMS. the issue is that these main memory
> databases have gone (back) to direct pointers ... but instead of being
> physical disk pointers ... there are direct memory addresses;
> ... while the RDBMS ... even with the database completely
> cached in memory ... is still threading thru indexes.

My experience with these has been of about a 10 times speedup even when threading through the indices. 100X would be stretching the point, i.e. running a benchmark on a finely tuned in-memory DBMS vs. an untuned disk-based DBMS (where disk I/O's eat up a considerable amount of clock time).

>
> there was possibly an intermediate step in the 90s when some object
> database played games with direct pointers that were swizzled into
> direct memory pointers when things were resident in memory.
>
> the issue for reliability more has to do with the transaction and
> commit boundaries and how they handle updates (being main memory
> doesn't preclude various journalling of changes to disk). somewhat to
> paraphrase, reliability/integrity can be an implementation trick that
> is orthogonal to the database model. the main memory can be
> initialized at start up from a combination of files on disk (say large
> scale striped array) and a journal. The issue of journal recovery time
> then is a performance issue ... but if you can perform updates 100
> times faster, then possibly you can recover a journal a 100 times
> faster.
>

Good points!

I agree with all but your last point (and that's a quibble).

The in-memory system we use initializes from one of two checkpoint files stored on a large-scale striped array and journal logs. It is an "implementation detail" which is, as you say, orthogonal to the database model. The way the implementation works is that *all* the data has to be read into memory before the DBMS is operational. (This can be likened to "priming the cache" in a disk-based system.)

For a 30 GB database, this is still on the order of 10+ minutes to initialize even if there were no journal files to read (graceful shutdown as opposed to crash.) This is limited by the throughput of the array because, even with an array, there is a physical limit on how fast you can get the data off the disks. (1-2 ms. clock-time per logical disk read - measured on a large array).

Since a disk-based DBMS can be "operational" without "priming the cache", the disk-based system could be considered "operational" in much less time on a graceful restart since there is no need to read the checkpoint files into memory.

The limiting condition, in our case, is the need to replicate to another site for reliability purposes. In order to guarantee serializability, the replication process on the receiving end is single-threaded. This limits our update rates to 8,500 TPS. (Tested in the lab, but required a dedicated 100 Mbps LAN.)

-- 
"It is impossible to make anything foolproof
because fools are so ingenious"
  - A. Bloch
Received on Mon May 03 2004 - 18:32:53 CEST

Original text of this message