Re: implementing a database log

From: Brian Selzer <>
Date: Thu, 24 Apr 2008 10:11:11 -0400
Message-ID: <4y0Qj.10698$>

"David BL" <> wrote in message

> On Apr 22, 11:39 pm, "Brian Selzer" <> wrote:

>> "David BL" <> wrote in message
>> > On Apr 22, 6:58 am, "Brian Selzer" <> wrote:
>> >> "Christoph Rupp" <> wrote in message
>> >>
>> >> > Brian,
>> >> > On Apr 21, 11:00 pm, "Brian Selzer" <>
>> >> > wrote:
>> >> >> Why not go with #4:
>> >> >> 4. a physical log based on modified rows. Whenever a row is
>> >> >> modified,
>> >> >> added
>> >> >> or removed, it is logged. Then you could also implement row
>> >> >> versioning--just add a row version field to the physical rows. I
>> >> >> believe
>> >> >> that this what snapshot isolation is built on.
>> >> > It's not an SQL database, i don't even have the notion of "rows",
>> >> > but
>> >> > basically i think your #4 is the same as my #1 or #2.
>> >> No, it isn't. #1 requires the logging of additional records that may
>> >> not
>> >> have been affected by an update. #2 doesn't log the entire changed
>> >> record,
>> >> but only bits and pieces. I would think that limiting the units of
>> >> change
>> >> to individual records--entire records--would simplify the process of
>> >> marking
>> >> and isolating units of work while at the same time guaranteeing
>> >> consistency.
>> > I don't think an atomic unit of work is always associated with a
>> > change to an individual record. Are you suggesting transactions to
>> > define arbitrarily large units of work aren't needed?
>> No, that's not what I'm suggesting. What I'm suggesting is that the
>> atomic
>> unit of work should be a /set/ of /records/--either the old records in
>> the
>> case of a before image or the new records in the case of an after image.
> Ok but that sounds like the system snapshots an entire table in the
> before/after images.
> For efficiency one would expect to only store the set of records that
> have been added and the set that have been removed by a given
> transaction.  It is easy to see how we get the inverse which is
> required for roll back of an uncommitted transaction during recovery.
> However these logical operations aren't idempotent (at least for bags
> of records).   How does recovery deal with non-idempotent redo()/
> undo() changes in the log?

Why would there be bags of records? At the physical level, each record has a specific offset in some file, and that offset would uniquely identify it. Why would you strip off that identification? Consequently, there wouldn't be bags of records, only sets of records.

If you start out with a set of records and you know what is to be inserted, updated, and deleted, you can compute the resulting set of records. If you start out with the resulting set of records, and you know what was inserted, updated and deleted in order to arrive at that result, then you can compute the original set of records. For simplicity, even if it isn't necessarily the most efficient, what is updated could be implemented as a set of ordered pairs of records, tying each original record to its replacement. So the log would consist of a sequence of triples (D, U, I) separated by transaction markers where D is a set of records that were deleted, U is a set of pairs of records that were updated, and I is a set of records that were inserted.

Now, provided that the log is written before the database--that is, (1) write the triple to the log, (2) write the database, (3) write the transaction marker in the log--, it should be possible to determine whether or not what was written to the log actually made it into the database, and thus it should be possible to roll back any uncommitted transaction. Received on Thu Apr 24 2008 - 16:11:11 CEST

Original text of this message