Xref: alice comp.databases.oracle.server:48373
Path: alice!news-feed.fnsi.net!hammer.uoregon.edu!logbridge.uoregon.edu!newsgate.cuhk.edu.hk!newsgate.netvigator.com!news.netvigator.com!usenet
From: Andrew Babb <andrewb@mail.com>
Newsgroups: comp.databases.oracle.server
Subject: Re: very (I repeat) slow DBWR
Date: Thu, 06 May 1999 08:13:25 +0800
Organization: Netvigator
Lines: 163
Message-ID: <3730DEA4.D5A181AB@mail.com>
References: <01be96cf$41038920$390120c0@GHP.creyfs.be> <7gp1v4$5h3$1@korweta.task.gda.pl> <01be96de$20d9bfc0$390120c0@GHP.creyfs.be>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.5 [en] (Win95; I)
X-Accept-Language: en
To: "Gerard H. Pille" <ghp@skynet.be>

Gerard,

I am sure you already have the Check Point Process configured, but just in case
you do not, CHECKPOINT_PROCESS=TRUE, so that you take some of the checkpoint
processing away from the Log Writer.

Also, have you checked to see how many cloned blocks are in the buffer cache,
try running catparr.sql (even if you have no cluster) and this will create the
v$bh view (I think, it is based upon x$bh), and provides insight into the
number of data blocks that are being cloned. There is also an UNDERSCORE
parameter, in 7.3 that limits the number of cloned blocks that Oracle supports
in the buffer cache (by block number), so instead of having maybe 40 cloned
blocks (HP default), all relating to the same data block, you can reduce this
to 5. I can't remember how this effects the checkpointing exercise, but we did
discover this to be a problem for us.

The other comment, is 50Mb Redo Logs, switching every couple of minutes is a
little bit small. The last Mission Critical system that I worked on, we ended
up with 160Mb Redo Logs, with more importantly a 20 - 30 minute switch time
during peak hours. To improve our recoverability, we ran checkpoints, via the
checkpoint_interval every 10Mb's or every 10 minutes via checkpoint_timeout.
This resulted in the recovery taking less than about 2 minutes (required since
we had SLA's of 10 minutes total downtime), but also meant that the lazy
database writers were always keeping the buffer cache upto date and clean.

Just a couple of comments.

"Gerard H. Pille" wrote:

> My excuses for the format of this reply, I don't succeed in setting a fixed
> font, you should read it with Courier.
>
> I can't find the reason for the bad performance in the utlbstat/utlestat
> myself.
>
> It's an 8 processor machine RS/6000 J50
>
> version:
>
> Oracle7 Server Release 7.3.4.3.0 - Production
> PL/SQL Release 2.3.4.3.0 - Production
> CORE Version 3.5.4.0.0 - Production
> TNS for IBM/AIX RISC System/6000: Version 2.3.4.0.0 - Production
> NLSRTL Version 3.2.4.0.0 - Production
>
> Latches:
>
> LATCH_NAME         GETS        MISSES      HIT_RATIO   SLEEPS
> SLEEPS/MISS
> ------------------ ----------- ----------- ----------- -----------
> -----------
> cache buffers chai    11019219           0           1           0
>  0
> cache buffers lru       113078           7           1           0
>  0
> dml lock allocatio        4147           0           1           0
>  0
> enqueue hash chain       35256           0           1           0
>  0
> enqueues                 64709           0           1           0
>  0
> ktm global data             12           0           1           0
>  0
> latch wait list             54           0           1           0
>  0
> library cache            64590          44        .999          18
> 409
> library cache load         260           0           1           0
>  0
> list of block allo        7101           0           1           0
>  0
> loader state objec          83           0           1           0
>  0
> messages                157629         108        .999           2
> 019
> modify parameter v         311           0           1           0
>  0
> process allocation           1           0           1           0
>  0
> redo allocation         298484         120           1           1
> 008
> redo copy                 1172        1157        .013        1530
> 1.322
> row cache objects        42639           0           1           0
>  0
> session allocation        2736           1           1           2
>  2
> session idle bit          3992           0           1           0
>  0
> session switching          522           0           1           0
>  0
> shared pool              12042          15        .999           9
> 6
> sort extent pool           414           0           1           0
>  0
> system commit numb      229645           0           1           0
>  0
> transaction alloca        3897           0           1           0
>  0
> undo global data         70373           0           1           0
>  0
>
> LATCH_NAME         NOWAIT_GETS      NOWAIT_MISSES    NOWAIT_HIT_RATIO
> ------------------ ---------------- ---------------- ----------------
> cache buffers chai           269124                0                1
> cache buffers lru            386570                8                9
> process allocation                1                0                1
> redo copy                    140787             2534             2535
>
> file statistics:
>
> TABLE_SPACE                    FILE_NAME
> READS      BLKS_READ  READ_TIME  WRITES     BLKS_WRT   WRITE_TIME MEGABYTES
> ------------------------------
> ------------------------------------------------
> ---------- ---------- ---------- ---------- ---------- ----------
> ----------
> ORADATA                        /misb/oradata02MIS.dbf
>          0          0          0      11368      11368  122441112
> 1074
> ORADATA                        /misb/oradata03MIS.dbf
>         15         15         21       4003       4003   38897899
> 944
>
> --
> Kind Regards,
>
> Gerard
>
> Piotr Kolodziej <pkol@otago.gda.pl> wrote in article
> <7gp1v4$5h3$1@korweta.task.gda.pl>...
> > Gerard H. Pille wrote in message
> <01be96cf$41038920$390120c0@GHP.creyfs.be>...
> > >The data-files of the tablespace being loaded via import, are on one
> > >SSA-disk.  This disk also contains the log of the Journaled File System.
>
> > >The DBWR(s) show(s) activity of around 70-80.
> > >hdisk3_b and hdisk4_b each contain a copy of the JFS-log, so less than
> 80
> > >Kb were written to the data-files in 5 seconds.[...]
> > >I started with 4 DBWR-processes with use_async_io = false, then switched
> to
> > >async_io which resulted in the DBWR doing all the work (sic) and the 4
> > >child-processes idling.  Same lack of performance.
> >
> > Slaves consume small amount of CPU because they perform only I/O.
> > All cpu-intensive work (like buffer lists scans) is done by master
> (dbwr).
> > I suspect some problems related to latches preventing dbwr from
> > effective maintaining dirty buffer and free buffer list.
> > BTW, is it multiprocessor machine ?
> > Sure, some problems may be related to particular RDBMS release.
> > Let's know about your RDBMS version (not dying processes, but you should
> > try using shutdown immediate instead of shutdown normal; ^C; shutdown
> > abort) and buffer cache size.
> > Some problems may be easily found via utlbstat/utlestat report.
> >
> > --
> > Piotr Kolodziej pkol@otago.gda.pl
> > Just my private opinion.
> >
> >