Re: DBWR - How Many is Too Many?
Date: Wed, 27 Feb 2008 21:18:07 -0500
It doesn't make a lot of sense that the SAN setup is the same, but with more disks and more mount points and slower performance. Definitely sounds like something is not configured right.
That said, the last time I had WIO that high it was on a Sun/Veritas system and vmstat showed that spin mutex'es were very high. The problem was the sync IO on VxFS. The solution was to either set mincache=direct or implement Quick IO. Both of these are veritas specific solutions, so I'm not sure what the AIX equivalent would be, but it means the filesystem files are accessed without involving the filesystem buffer cache (like raw devices).
I would definitely try to gather evidence against the SAN being slow to make a case with the storage team, but I would also look into the direct IO thing.
On Wed, Feb 27, 2008 at 1:53 PM, David Barbour <david.barbour1_at_gmail.com> wrote:
> We recently moved our database to a new SAN. Performance has just
> tanked. Here's the environment:
> Oracle 184.108.40.206
> SAN - IBM DS4800
> We've got 8 filesystems for Oracle data files. Redo, Archive, Undo and
> Temp are all on seperate disk/filesystems from the data files.
> All the Oracle datafiles are on RAID5 LUNs with 12 15K RPM 73 (68 usable)
> GB drives. SAN Read and Write Caching are both enabled.
> A statspack (generally for any given interval - this was for a period of
> "light" processing) shows me our biggest hit is:
> Buffer wait Statistics for DB: PR1 Instance: PR1 Snaps: 12609 -12615
> -> ordered by wait time desc, waits desc
> Tot Wait Avg
> Class Waits Time (s) Time (ms)
> ------------------ ----------- ---------- ---------
> data block 278,194 20,811 75
> sar is scary (just a small portion)
> AIX r3prdci1 3 5 00CE0B8A4C00 02/27/08
> System configuration: lcpu=8
> 00:00:00 %usr %sys %wio %idle physc
> 02:15:01 19 19 42 19 4.00
> 02:20:00 21 25 40 14 4.00
> 02:25:00 19 18 43 20 4.00
> 02:30:00 18 18 43 21 4.00
> 02:35:00 20 24 40 16 4.00
> We're running JFS2 filesystems with CIO enabled, 128k element size on the
> SAN and AIO Servers are set at minservers = 220 and maxservers = 440
> We've got 32GB of RAM on the server and 4 CPUs (which are dual core for
> all intents and purposes - they show up as eight). We're running SAP which
> has it's own memory requirements. I've configured my SGA and PGA using
> Automatic Memory Management and the SGA currently looks like:
> SQL> show sga
> Total System Global Area 1.0739E+10 bytes
> Fixed Size 757152 bytes
> Variable Size 8589934592 bytes
> Database Buffers 2147483648 bytes
> Redo Buffers 1323008 bytes
> filesystemio_options = setall
> I'm thinking the data block waits is the result of too many modified
> blocks in the buffer cache. Solution would be to increase the number of
> db_writer_processes, but we've already got 4. Metalink, manuals, training
> guides, Google, etc. seem to suggest two answers.
> 1. One db writer for each database disk - in our case that would be 8
> 2. CPUs/8 adjusted for multiples of CPU groups - in our case that would
> be 4
> Any thoughts?