Re: Relational Model and Search Engines?

From: Anne & Lynn Wheeler <lynn_at_garlic.com>
Date: Mon, 03 May 2004 13:32:50 -0600
Message-ID: <ufzahi9wd.fsf_at_mail.comcast.net>


Nick Landsberg <hukolau_at_NOSPAM.att.net> writes:
> For a 30 GB database, this is still on the order
> of 10+ minutes to initialize even if there were
> no journal files to read (graceful shutdown as
> opposed to crash.) This is limited by the throughput
> of the array because, even with an array, there
> is a physical limit on how fast you can get the
> data off the disks. (1-2 ms. clock-time per logical disk
> read - measured on a large array).

lets say i initialize from some checkpointed image ... then recovery is place in the journal from the checkpointed image to the current entry. the size of journal is proportional the update rate since the last checkpoint (and could be independent of database size, for some databases it could be trivial over extended period of time).

lets say build stripe array that when read sequential saturates a 30mbyte i/o interface. 20+ years ago, i demenstrated sequential recovery sequences that would read single disk recovery and do 15 tracks in 15 revolutions (effectively achieving very near disk media transfer rate at 3mbyte/sec, in this situation the i/o bus rate and the disk transfer rate were the same) ... and be able to do multiple in parallel on different channels. so say a single striped array _at_30mbytes/sec recovers 30gbytes in approx. 1000 seconds or 17 minutes. spread across two such i/o interfaces would cut it to 8.5minutes and spread across four such i/o interfaces cuts it to a little over four minutes.

The problem now is that you getting into some operating system restart times. Given that you are doing 30gbyte image recovery ... it is too bad that you can't go a little further and do like the laptop suspend operations that write all of memory to protected disk location for instant restart. With trivial amount more I/O, checkpoint all of physical memory for "instant on" system recovery.

Backup image can be done like some of the hot disk database backups. Since you aren't otherwise doing a lot of disk i/o ... and you probably have at least ten times the disk space in order to get enuf disk arms, then you could checkpoint versions to disk with journal cursors for fuzzy image states. Frequency could possibly be dictated by trade-off between overhead for doing more frequent checkpoint vis-a-vis having to process more journal records ... as well as projected MTBF. Five-nines availability allows 5minutes downtime per year. At four minutes recovery ... that says you get one outage per year. For really high availability ... you go to replicated operations. Recovery of a failed node than is slightly more complicated since it has to recover the memory image, the journal and then the journal entries done by the other processor.

With replicated systems, then there is some issue of whether you can get by with two 30mbyte/sec transfer arrays per system for 8min system recovery time .... since the other system would mask downtime. Each system would have two 30mbyte/sec transfer array configurations rather than single system have four 30mbyte/sec transfer arrays.

-- 
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Received on Mon May 03 2004 - 21:32:50 CEST

Original text of this message