Re: Relational Model and Search Engines?

From: Anne & Lynn Wheeler <lynn_at_garlic.com>
Date: Mon, 03 May 2004 18:37:53 -0600
Message-ID: <ubrl5hvry.fsf_at_mail.comcast.net>


Leandro GuimarĂ£es Faria Corsetti Dutra <leandro_at_dutra.fastmail.fm> writes:
> So, it is basically for throwaway data. I was hoping for
> something more interesting.

the loss of the last 10 transactions isn't any different than other DBMS ... if it hasn't been written to the log/journal as part of commit ... and the system goes down ... then the transactions haven't been done.

i would guess that almost any high end system in existance would likely have ten or more received transactions that are actively in the process of execution & uncommitted at any point of time ... and are conceivably lost if the system crashes at any particular moment.

these days it isn't actually quite that bad ... there are frequently other components in the infrastructure that will redrive transactions that time-out w/o acking (i.e. indicating complete & committed).

the log/journal/commit scenario isn't any different for disk-based DBMS than it is for real-memory DBMS.

all the DBMS tend to have a backup, they all tend to have home-record location (except for some versioning DBMS), and they all tend to have commits that involve recovery with log/journal.

the difference between disk-based and memory-based DBMS is that disk-based have home record location on disk and memory-based DBMS have home record location in memory (totally eliminating caching process).

in the hot-backup scenario ... disk-based DBMS can do hot-backup with slightly fuzzy image corrected with appropriate journal entries. this typically involves doing disk-to-disk copy while the system is running live.

the memory-based DBMS similarly can do hot-backup ... except it is directly from memory to disk ... and may have some of the same fuzzy copy issues that a disk-based hot-backup has ... and is also made consistent with journal entries.

disk-based DBMS may expect to only infrequently resort to recovery from a hot-backup ... however the memory-based DBMS is expecting to do recovery from its hot-backup every time the system is rebooted and/or the dbms restarted.

since the memory-based DBMS is 1) expecting to make more frequent use of its hot-backup (vis-a-vis a disk-based DBMS), 2) the disks are otherwise idle, and 3) hot-backup doesn't represent contention with transactions accessing the disks ... it is likely to perform more frequent hot-backups.

some of the technology issues are that memories have gotten large enuf to contain what used to be large databases ... and raid/stripping disk technology has gotten fast enuf that the hot-backups and recovery is a reasonably containable process.

the memory size increases also represent other types of paradigm shift opportunities. a long time, extrenely large, widely used, query only database implementation has been around for 40 some years (there are updates that happen every couple weeks, but those changes are batch processed).

about ten years ago, i looked at it and realized that there was a paradigm shift available. the core information could now fit in real memory and the query results could be calculated in real time. over the previous 40 years, the core information couldn't be contained in real storage ... so the process was to precalculate all possible answers and store them in a disk database. The resulting database of all possible answers was extremely large and couldn't fit in even today's real memory. However, I realized that with technology changes, it was now possible to load the core information into real memory and it would take less CPU time to calculate the answer in real time than the CPU time to read the disk record with the precalculated answer from a (disk) database (or even look it up in a cache).

As it happens, they couldn't actually practically precalculate all possible answers, they had to leave out a large number of cases ... however, it was actually possible to calculate in real-time the answer for every possibly query (given the core data was available in memory). The issue of not being able to answer every possible query was a long-term recognized limitation of the original solution.

I used a fast, sequential memory load ... which for this particular operation could be optimized and startup/recovery could be done in a few seconds. Answers that didn't come back within very short proscribed time were automagically redriven ... masking failure/recovery scenarios.

-- 
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Received on Tue May 04 2004 - 02:37:53 CEST

Original text of this message