Re: How keep 8i standby no-more-than 30 minutes behind?

From: Howard J. Rogers <dba_at_hjrdba.com>
Date: Wed, 27 Mar 2002 15:09:44 +1100
Message-ID: <a7rgoa$mme$1@lust.ihug.co.nz>

The "subtlety" was documented in the 8i training course material, so I'd be (a bit) surprised if there's literally nothing on it in the doco.... but I haven't checked.

In 8.0 and before, _timeout meant 'start counting the specified number of seconds as soon as a new checkpoint is issued, and then fire off another one (and start counting again)'. But that was pretty meaningless. At lunch time, when all Users are off down the pub, every 10 minutes (or whatever you'd set _timeout to), you'd get a fresh checkpoint even though no-one was doing anything on the database. At 9.00am, on the other hand, with all your Users fresh and raring to go, you could produce a million new transactions in those same 10 minutes (well, maybe not a million, but you get the idea). The point is, "10 minutes" doesn't equate to a fixed amount of redo to be checkpointed. So you might checkpoint a lot or a little. Which meant that Instance Recovery might have to recover a lot or a little, and you couldn't tell which beforehand.

In 8i, the parameter means, 'where was the end of the redo stream within this log 10 minutes ago'. Which makes Instance Recovery far more predictable, right? Nah. It has the same effect. You end up checkpointing 10 minutes'-worth of transactions, and that still means you checkpoint practically nothing at noon and lots at 9am. So the parameter is still pretty pointless as a way of limiting Instance Recovery time (which is why 9i doesn't use it as part of its new MTTR_TARGET parameter calculations).

A more significant effect from the change was felt by _INTERVAL. In the old days, it means 'fire off a checkpoint when (let's say) 1000 redo blocks have been written (ie, 512K on most systems)'. Fine... this means that you can guarantee that there will not be more than 512K of redo that requires to be recovered during Instance Recovery. So you now have a hard-ish limit on the *maximum* amount of time an Instance Recovery can take. But what it actually takes to perform is still unpredictable under that limit, because you might Instance crash just after a checkpoint, or just before the next one. So you might need to read 2 redo blocks to recover, or 999. Within that, it's still variable. The new 8i way of looking at it is: 'where was my redo 1000 redo blocks ago? That will be the point to which I will now checkpoint'. Which leaves 1000 blocks uncheckpointed. And if you generate another 100 blocks of redo, then the target for the checkpoint advances by 100. In theory, the number of redo blocks required to be read for Instance Recovery will therefore always be 1000. Which means you now know exactly how much redo there will be needed to be replayed, so you've got half the information you need to predict Instance Recovery time pretty well (the other half is the number of data blocks we have to fetch off disk... ie, the number of dirty buffers, which can be fixed by fast_start_io_target).

All of which is frightfully interesting, but none of which has the least part to play in addressing your concern about making sure the standby database doesn't lag the production system by more than 30 minutes. For the simple reason that standby databases are only brought up to date by the creation (and application) of a new archive log. And the creation of new archives on the production system has absolutely nothing to do with the frequency of your parameter-induced checkpoints. The only thing that will generate a new archive is a log switch. So the real answer to your actual query is: you size your online redo logs to an appropriate size such they switch from one to another, on average, about every half hour. But that's only an average, which means sometimes, when the users aren't doing much, you're only going to switch every hour. And when they're busy, you'll switch every 10 minutes. And there's nothing, but nothing, you can do to impose a hard and fast and guaranteed clock-time limit on that behaviour.

Unless, of course, you create a job (dbms_job is useful here, but you could do it from outside the database with cron) which issues an 'alter system switch logfile' every half hour. That way, regardless of whether the online log is practically empty (quiet time) or full (busy), the logs will switch and an archive will be produced, which can be shipped to the standby.

Of course, there is another answer: upgrade to 9i. There, you can configure the standby to lag the database by a specific number of minutes. But that's designed to *prevent* the application of shipped redo, to allow user errors to be corrected without the expense of an incomplete recovery. And that's not what you're after.

Sorry for the long reply.

Regards
HJR

--
------------------------------------------
Resources for Oracle : www.hjrdba.com
============================


"Guy D. Kuhn" <guy.d.kuhn_at_saic.com> wrote in message
news:3CA13F97.6DE59EE2_at_saic.com...


> I fond on MetaLink that the interpretation of log_checkpoint_timeout and

> log_checkpoint_interval was changed for 8i.  The documentation,

> appearently, was not changed.  The new interpretation (.. lag the tail

> of the log...) is a subtlety the eludes me.  The bottom line is, how do

> I ensure my standby is not more than 30 minutes behind the primary?

> Setting log_checkpoint_timeout does cause a checkpoint at

> no-greater-than 30 minutes.

>

Received on Tue Mar 26 2002 - 22:09:44 CST