Re: Checkpoint# vs SCN

From: Yass Khogaly <ykhogaly_at_us.oracle.com>
Date: Mon, 9 Aug 1999 09:06:32 -0800
Message-ID: <7omuct$6h6$1@inet16.us.oracle.com>

WHAT IS A CHECKPOINT?

  A Checkpoint is a database event which synchronizes the data blocks in memory
  with the Datafiles on disk. A checkpoint has two purposes: (1) to establish
  data consistency, and (2) enable faster database recovery. How is recovery
  faster? Because all database changes up to the checkpoint have been recorded
  in the Datafiles, making it unnecessary to apply redo log entries prior to the
  checkpoint.

During a checkpoint the following occurs:

The database writer (DBWR) writes all modified database blocks in the buffer cache back to Datafiles,
Log writer (LGWR) updates both the controlfile and the Datafiles to indicate when the last checkpoint occurred (SCN)

  If the optional background process CKPT, the checkpoint process, is enabled,
  then CKPT performs the operations of LGWR above. The advantages of enabling
  CKPT are discussed below.

A checkpoint occurs when Oracle performs a log switch from one group to another, when the number of operating system blocks specified by LOG_CHECKPOINT_INTERVAL have been written to the redo log, when the time specified by LOG_CHECKPOINT_TIMEOUT has expired, or when a checkpoint has been
forced by the DBA.

CHECKPOINTS AND PERFORMANCE

Checkpoints present a tuning dilemma for the Database Administrator. Frequent
checkpoints will enable faster recovery, but can cause performance degradation. How then should the DBA address this?

  Depending on the number of Datafiles in a database, a checkpoint can be a   highly resource intensive operation, since all datafile headers are frozen   during the checkpoint. There is a performance trade-off regarding frequency
  of checkpoints. More frequent checkpoints enable faster database recovery   after a crash. This is why some customer sites which have a very low   tolerance for unscheduled system downtime will often choose this option.   However, the performance degradation of frequent checkpoints may not justify
  this philosophy in many cases. Let's assume the database is up and running 95%
  of the time, and unavailable 5% of the time from infrequent instance crashes
  or hardware failures requiring database recovery. For most customer sites, it
  makes more sense to tune for the 95% case rather than the rare 5% downtime.

This bulletin assumes that performance is your number one priority and so recommendations are made accordingly.

Therefore, your goal is to minimize the frequency of checkpoints through tuning.

Tuning checkpoints involves four key initialization parameters

CHECKPOINT_PROCESS
LOG_CHECKPOINT_INTERVAL
LOG_CHECKPOINT_TIMEOUT
LOG_CHECKPOINTS_TO_ALERT

These parameters are discussed in detail below.

Recommendations are also given for handling "checkpoint not complete" messages
found in the alert log, which indicate a need to tune redo logs and checkpoints.

CHECKPOINT_PROCESS

  The CHECKPOINT_PROCESS init.ora parameter determines whether or not the   optional CKPT background process will be started to perform LGWRs tasks during
  checkpoint operations of updating the datafile headers. LGWR is then free to
  perform its' primary function flushing the redo log buffer to the online redo
  logs.

  The CKPT process can improve performance significantly and decrease the amount
  of time users have to wait for a checkpoint operation to complete. The   overhead associated with starting another background process is not   significant when compared to the performance benefit to be gained by enabling
  CKPT, therefore, Oracle recommends always enabling the checkpoint process   (CKPT).   IMPORTANT NOTES on CHECKPOINT_PROCESS ORACLE7 and ORACLE8:

The CKPT process is optional in lower versions of Oracle7, but is mandatory in
Oracle8.

In versions 7.0 - 7.3.2, the CKPT is an optional background process which is
enabled by setting CHECKPOINT_PROCESS=TRUE in init.ora.

In versions 7.3.3 and 7.3.4, the CKPT process will be started automatically
regardless of the CHECKPOINT_PROCESS setting if either of the following conditions exist:

a large value for DB_FILES (50 or higher)
a large value for DB_BLOCK_BUFFERS (10,000 or higher)

In version 8.0.3 and higher, the CKPT process is always enabled. Attempting
to set CHECKPOINT_PROCESS in the init.ora will give the following error:

LM-101 "unknown parameter name checkpoint_process"

LOG_CHECKPOINT_INTERVAL

  The LOG_CHECKPOINT_INTERVAL init.ora parameter controls how often a checkpoint
  operation will be performed based upon the number of operating system blocks
  that have been written to the redo log. If this value is larger than the size
  of the redo log, then the checkpoint will only occur when Oracle performs a
  log switch from one group to another, which is preferred.

  On most UNIX systems the operating system block size is 512 bytes. This means
  that setting LOG_CHECKPOINT_INTERVAL to a value of 10,000 (the default   setting), causes a checkpoint to occur after 5,120,000 (5M) bytes are written
  to the redo log. If the size of your redo log is 20M, you are taking 4   checkpoints for each log.

  LOG_CHECKPOINT_INTERVAL influences when a checkpoint occurs, which means   careful attention should be given to the setting of this parameter, keeping it
  updated as the size of the redo log files is changed. The checkpoint   frequency is one of the factors which impacts the time required for the   database to recover from an unexpected failure. Longer intervals between   checkpoints mean that if the system crashes, more time will be needed for the
  database to recover. Shorter checkpoint intervals mean that the database will
  recover more quickly, at the expense of increased resource utilization during
  the checkpoint operation.

  This parameter also impacts the time required to complete a database recovery
  operation during the roll forward phase of recovery. The actual recovery time
  is dependent upon this time, and other factors, such as the type of failure
  (instance or system crash, media failure, etc.), and the number of archived
  redo logs which need to be applied.

LOG_CHECKPOINT_TIMEOUT

  The LOG_CHECKPOINT_TIMEOUT init.ora parameter controls how often a checkpoint
  will be performed based on the number of seconds that have passed since the
  last checkpoint. Checkpoint frequency impacts the time required for the   database to recover from an unexpected failure. Longer intervals between   checkpoints mean that more time will be required during database recovery.

  Oracle recommends using LOG_CHECKPOINT_INTERVAL to control the checkpoint   interval rather than LOG_CHECKPOINT_TIMEOUT, which will initiate a checkpoint
  every "n" seconds, regardless of the transaction frequency. This can cause
  unnecessary checkpoints in cases where transaction volumes vary. Unnecessary
  checkpoints must be avoided whenever possible for optimal performance.

  It is a misconception that setting LOG_CHECKPOINT_TIMEOUT to a given value   will initiate a log switch at that interval, enabling a recovery window used
  for a stand-by database configuration. Log switches cause a checkpoint, but a
  checkpoint does not cause a log switch. The only way to cause a log switch is
  manually with ALTER SYSTEM SWITCH LOGFILE or resizing the redo logs to cause
  more frequent switches. This is controlled by operating system blocks, not a
  timed interval.

Sizing of the online redo logs is critical for performance and recovery.

See additional sections below on redo logs and checkpoints.

LOG_CHECKPOINTS_TO_ALERT:

  The LOG_CHECKPOINTS_TO_ALERT init.ora parameter, when set to a value of TRUE,
  allows you to log checkpoint start and stop times in the alert log. This is
  very helpful in determining if checkpoints are occurring at the optimal   frequency and gives a chronological view of checkpoints and other database   activities occurring in the background.

REDO LOGS AND CHECKPOINT TUNING:

  A checkpoint occurs at every log switch. If a previous checkpoint is already
  in progress, the checkpoint forced by the log switch will override the current
  checkpoint.

  This necessitates well-sized redo logs to avoid unnecessary checkpoints as a
  result of frequent log switches. The alert log is a valuable tool for   monitoring the rate that log switches occur, and subsequently, checkpoints   occur. Oracle recommends sizing the online redo logs such that switches occur
  no more than once per hour. The following is an example of quick log switches
  from the alert log:

  Fri May 16 17:15:43 1997
  Thread 1 advanced to log sequence 1272     Current log# 3 seq# 1272 mem# 0: /prod1/oradata/logs/redologs03.log   Thread 1 advanced to log sequence 1273     Current log# 1 seq# 1273 mem# 0: /prod1/oradata/logs/redologs01.log   Fri May 16 17:17:25 1997
  Thread 1 advanced to log sequence 1274     Current log# 2 seq# 1274 mem# 0: /prod1/oradata/logs/redologs02.log   Thread 1 advanced to log sequence 1275     Current log# 3 seq# 1275 mem# 0: /prod1/oradata/logs/redologs03.log   Fri May 16 17:20:51 1997
  Thread 1 advanced to log sequence 1276     Current log# 1 seq# 1276 mem# 0: /prod1/oradata/logs/redologs01.log

If redo logs switch every 3 minutes, you will see performance degradation. This indicates the redo logs are not sized large enough to efficiently handle
the transaction load.

CHECKPOINT NOT COMPLETE

At times you may see "checkpoint not complete" messages in the alert log as
follows:

Thread 1 advanced to log sequence 248 Current log# 2 seq# 248 mem# 0: /prod1/oradata/logs/redologs02.log Thread 1 cannot allocate new log, sequence 249 Checkpoint not complete

  These messages indicate that Oracle is ready to recycle the redo logs but the
  checkpoint has not been complete on the redo log Oracle needs to accomplish
  this.

  You should also check the bstat/estat report for the statistics   BACKGROUND_CHECKPOINTS_COMPLETED and BACKGROUND_CHECKPOINTS_STARTED. These
  two numbers should never differ by more than 1. If there is a large   difference in these two statistics it indicates that checkpoints are starting
  but not completing, just as "checkpoint not complete" messages in the alert
  log indicate. The number of checkpoints completed and started as indicated by
  these statistics should be weighed against the duration of the bstat/estat   report. Keep in mind the goal of only one log switch per hour, which ideally
  should equate to one checkpoint per hour as well.

The way to resolve incomplete checkpoints is through tuning checkpoints and
logs:

Give the checkpoint process more time to cycle through the logs
- add more redo log groups
- increase the size of the redo logs
Reduce the frequency of checkpoints
- increase LOG_CHECKPOINT_INTERVAL
- increase size of online redo logs
Improve the efficiency of checkpoints
- enable the CKPT process with CHECKPOINT_PROCESS=TRUE

SUMMARY:

  In summary, checkpoint processing and the associated tuning required is a key
  factor in overall database performance. As an ongoing task, the Database   Administrator must periodically monitor the efficiency of checkpoints to avoid
  bottlenecks around this processing during peak volume activity periods.

"The Views expressed here are my own and not necessarily those of Oracle Corporation"

Juan Carlos <jc_va_at_hotmail.com> wrote in message news:7ojp5b$qgf$1_at_bgtnsc02.worldnet.att.net... > I am studying for the Backup/Recovery OCP test. Oracle checks the > checkpoint number of all redo, control, and datafiles for consistency. But

> where is this stored, i.e. what v$ view/column?  In v$datafile and
> v$datafile_header I see CHECKPOINT_CHANGE#, but this seems to be more of a
> SCN versus a checkpoint number.
>
> Is Oracle perhaps a little loose in their terminology, and the SCN at the
> last checkpoint is recorded, not a checkpoint number itself, is checked

for

> consistency across datafiles?  A query shows it is consistent.
>
> Thanks for your help.  I am about 2 weeks away from the test....
>
>

Received on Mon Aug 09 1999 - 12:06:32 CDT