Re: very (I repeat) slow DBWR

From: Andrew Babb <andrewb_at_mail.com>
Date: Thu, 06 May 1999 08:13:25 +0800
Message-ID: <3730DEA4.D5A181AB@mail.com>

Gerard,

I am sure you already have the Check Point Process configured, but just in case you do not, CHECKPOINT_PROCESS=TRUE, so that you take some of the checkpoint processing away from the Log Writer.

Also, have you checked to see how many cloned blocks are in the buffer cache, try running catparr.sql (even if you have no cluster) and this will create the v$bh view (I think, it is based upon x$bh), and provides insight into the number of data blocks that are being cloned. There is also an UNDERSCORE parameter, in 7.3 that limits the number of cloned blocks that Oracle supports in the buffer cache (by block number), so instead of having maybe 40 cloned blocks (HP default), all relating to the same data block, you can reduce this to 5. I can't remember how this effects the checkpointing exercise, but we did discover this to be a problem for us.

The other comment, is 50Mb Redo Logs, switching every couple of minutes is a little bit small. The last Mission Critical system that I worked on, we ended up with 160Mb Redo Logs, with more importantly a 20 - 30 minute switch time during peak hours. To improve our recoverability, we ran checkpoints, via the checkpoint_interval every 10Mb's or every 10 minutes via checkpoint_timeout. This resulted in the recovery taking less than about 2 minutes (required since we had SLA's of 10 minutes total downtime), but also meant that the lazy database writers were always keeping the buffer cache upto date and clean.

Just a couple of comments.

"Gerard H. Pille" wrote:

> My excuses for the format of this reply, I don't succeed in setting a fixed
> font, you should read it with Courier.
>
> I can't find the reason for the bad performance in the utlbstat/utlestat
> myself.
>
> It's an 8 processor machine RS/6000 J50
>
> version:
>
> Oracle7 Server Release 7.3.4.3.0 - Production
> PL/SQL Release 2.3.4.3.0 - Production
> CORE Version 3.5.4.0.0 - Production
> TNS for IBM/AIX RISC System/6000: Version 2.3.4.0.0 - Production
> NLSRTL Version 3.2.4.0.0 - Production
>
> Latches:
>
> LATCH_NAME GETS MISSES HIT_RATIO SLEEPS
> SLEEPS/MISS
> ------------------ ----------- ----------- ----------- -----------
> -----------
> cache buffers chai 11019219 0 1 0
> 0
> cache buffers lru 113078 7 1 0
> 0
> dml lock allocatio 4147 0 1 0
> 0
> enqueue hash chain 35256 0 1 0
> 0
> enqueues 64709 0 1 0
> 0
> ktm global data 12 0 1 0
> 0
> latch wait list 54 0 1 0
> 0
> library cache 64590 44 .999 18
> 409
> library cache load 260 0 1 0
> 0
> list of block allo 7101 0 1 0
> 0
> loader state objec 83 0 1 0
> 0
> messages 157629 108 .999 2
> 019
> modify parameter v 311 0 1 0
> 0
> process allocation 1 0 1 0
> 0
> redo allocation 298484 120 1 1
> 008
> redo copy 1172 1157 .013 1530
> 1.322
> row cache objects 42639 0 1 0
> 0
> session allocation 2736 1 1 2
> 2
> session idle bit 3992 0 1 0
> 0
> session switching 522 0 1 0
> 0
> shared pool 12042 15 .999 9
> 6
> sort extent pool 414 0 1 0
> 0
> system commit numb 229645 0 1 0
> 0
> transaction alloca 3897 0 1 0
> 0
> undo global data 70373 0 1 0
> 0
>
> LATCH_NAME NOWAIT_GETS NOWAIT_MISSES NOWAIT_HIT_RATIO
> ------------------ ---------------- ---------------- ----------------
> cache buffers chai 269124 0 1
> cache buffers lru 386570 8 9
> process allocation 1 0 1
> redo copy 140787 2534 2535
>
> file statistics:
>
> TABLE_SPACE FILE_NAME
> READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES
> ------------------------------
> ------------------------------------------------
> ---------- ---------- ---------- ---------- ---------- ----------
> ----------
> ORADATA /misb/oradata02MIS.dbf
> 0 0 0 11368 11368 122441112
> 1074
> ORADATA /misb/oradata03MIS.dbf
> 15 15 21 4003 4003 38897899
> 944
>
> --
> Kind Regards,
>
> Gerard
>
> Piotr Kolodziej <pkol_at_otago.gda.pl> wrote in article
> <7gp1v4$5h3$1_at_korweta.task.gda.pl>...
> > Gerard H. Pille wrote in message
> <01be96cf$41038920$390120c0_at_GHP.creyfs.be>...
> > >The data-files of the tablespace being loaded via import, are on one
> > >SSA-disk. This disk also contains the log of the Journaled File System.
>
> > >The DBWR(s) show(s) activity of around 70-80.
> > >hdisk3_b and hdisk4_b each contain a copy of the JFS-log, so less than
> 80
> > >Kb were written to the data-files in 5 seconds.[...]
> > >I started with 4 DBWR-processes with use_async_io = false, then switched
> to
> > >async_io which resulted in the DBWR doing all the work (sic) and the 4
> > >child-processes idling. Same lack of performance.
> >
> > Slaves consume small amount of CPU because they perform only I/O.
> > All cpu-intensive work (like buffer lists scans) is done by master
> (dbwr).
> > I suspect some problems related to latches preventing dbwr from
> > effective maintaining dirty buffer and free buffer list.
> > BTW, is it multiprocessor machine ?
> > Sure, some problems may be related to particular RDBMS release.
> > Let's know about your RDBMS version (not dying processes, but you should
> > try using shutdown immediate instead of shutdown normal; ^C; shutdown
> > abort) and buffer cache size.
> > Some problems may be easily found via utlbstat/utlestat report.
> >
> > --
> > Piotr Kolodziej pkol_at_otago.gda.pl
> > Just my private opinion.
> >
> >
Received on Wed May 05 1999 - 19:13:25 CDT