RE: checkpoint incomplete issue

From: Cary Millsap <cary.millsap_at_hotsos.com>
Date: Fri, 19 Nov 2004 16:19:51 -0600
Message-ID: <007801c4ce85$e3ce0b00$6400a8c0@CVMLAP02>

One thing I'm surprised nobody's mentioned is that excessive = checkpointing
can be caused by excessive writing. On occasion, we see the application = that
does something like

	update t set col=3D'X'
	where {predicates that don't include "and col<>'X'"}

Get it? If there were a bunch of "col=3D'X'" rows to begin with, you're telling Oracle to change the existing 'X' values to new 'X' values. Of course, this can be brutal on DBWR, LGWR, ARCH, processes that are = trying to
use CPU or acquire latches, undo management, etc.

I hate to see people trying to solve problems like this by randomly = hacking
parameters, adding files, or buying hardware, when the best solution = would
be to make the application not do stuff that it doesn't need to be doing = in
the first place.

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:

- Performance Diagnosis 101: 1/4 Calgary
- SQL Optimization 101: 11/8 Dallas, 12/13 Atlanta
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit www.hotsos.com for schedule details...

-----Original Message-----
From: oracle-l-bounce_at_freelists.org =
[mailto:oracle-l-bounce_at_freelists.org]
On Behalf Of Jeremiah Wilton
Sent: Friday, November 19, 2004 3:14 PM
To: oracle-l_at_freelists.org
Subject: RE: checkpoint incomplete issue

On Fri, 19 Nov 2004, Freeman Robert - IL wrote:

> Hmmmm... maybe I was wrong, or the behavor has changed at some point =
and
> time (7 vs 8??).... looking at my 9i and 10g alert logs, it appears =
that
the
> checkpoint not complete messages only appear if it can't switch =
logfiles.
>
> Maybe I'm just getting to old and senile.

Well that may be the case with me as well. I wrote the article in 1932 based on Oracle 7.3.2 on a hand-cranked granite computer. But I don't think the fundamental rule has changed. "Cannot allocate new log - checkpoint not complete".

You can have 10 logs, and switch through them all without completing a checkpoint (if using LOG_CHECKOINT_INTERVAL it will happily start over with a new checkpoint at each log switch even if the previous one is not complete), until you get to the last log, and they are all status ACTIVE, at which point it runs out of logs that are INACTIVE and has nowhere to switch to. Then you get CNC.

This problem, once rife, is a real rarity these days.

Often someone has decided to use an immature filesystem and it serializes on file access (defeating multiple DB writers).

Another fave is the old software mirroring across arrays. That software mirroring software (like VxVM) goes into full resync mode if the machine crashes or the mirror gets otherwise out of sync. This can suck up huge quantities of available IO on a running production system.

Finally, most people who think they are using async IO are not, due to incorrect filesystem, OS kernel or volume manager configuration. The only way to verify is to use a system call tracer to see what flags the DB writer is using to open the file. There are numerous platform and filesystem-specific recipes on MetaLink for getting async to work and making sure it is really on.

--
Jeremiah Wilton
Independent Oracle Professional
Oracle Certified Master
Disaster Recovery - Seminars - Technical Interviews
http://www.speakeasy.net/~jwilton
--
http://www.freelists.org/webpage/oracle-l

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Nov 19 2004 - 16:27:56 CST