RE: DBWR - How Many is Too Many?

From: Pratap Singh (c) <psingh_at_vmware.com>
Date: Wed, 27 Feb 2008 14:59:05 -0800
Message-ID: <C388F0160F8F6A4DA25614D019619C8F170841@PA-EXCH21.vmware.com>


High IOWait may be caused if kernel parameter are not set optimally.  

Just quoting from an old reference
"The one I find helpful is the "ps -elf" which shows the status of the processes. There were a number of suggestions to perform a truss to see what's going on with that process - as seen in the "ps -elf" command. A number of people suggested using the lsof command. In my particular situation this was not as helpful since there were a number of database running in the busy partition - lsof showed me all of the big datafiles opened by the five database instances.

I noted that the developers were creating databases on their own. The /etc/system parameters were for two databases so I increased the number of semaphores based on the Oracle guidelines. This appears to have solved the problem and alleviated what appears to be locking."  



PB Singh
DW Architect and Sr Data Modeler
VMware    

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of David Barbour
Sent: Wednesday, February 27, 2008 2:27 PM To: Greg Rahn
Cc: oracle-l_at_freelists.org
Subject: Re: DBWR - How Many is Too Many?

Good point(s). And 'free buffer waits' is NOT in my top 5.

On 2/27/08, Greg Rahn <greg_at_structureddata.org> wrote:

        My first question to you (and others who increase the number of dbwr processes): Are you seeing 'free buffer waits' in your 'Top 5' in the Statspack report? If not, it is unlikely that adding more dbwr process will yield any benefit.         

        If you are experiencing high wait IO (>40% is high), then what benefit is there to potentially do more IO, by adding dbwrs? I would think that would make things worse. You need to get to the root cause of why the WIO is high. My suggestion is start by investigating the iostat data (what are the IO response times, lun queue depth, etc). It sounds quite likely that if you even use a simple file create test, the IO would be poor. I would use /usr/sbin/lmktemp and do some lower level testing.         

        The other observation I would make is that if you changed the SAN, and not the database, and it worked ok before, then in all likelihood it is not a database problem. No?                           

        On 2/27/08, David Barbour <david.barbour1_at_gmail.com> wrote:

                sar is scary (just a small portion)                 

		00:00:00    %usr    %sys    %wio   %idle   physc
		02:15:01      19      19      42      19    4.00
		02:20:00      21      25      40      14    4.00
		02:25:00      19      18      43      20    4.00
		02:30:00      18      18      43      21    4.00
		02:35:00      20      24      40      16    4.00
		
		We're running JFS2 filesystems with CIO enabled, 128k element
size on the SAN and AIO Servers are set at minservers = 220 and maxservers = 440

                We've got 32GB of RAM on the server and 4 CPUs (which are dual core for all intents and purposes - they show up as eight).

                I'm thinking the data block waits is the result of too many modified blocks in the buffer cache. Solution would be to increase the number of db_writer_processes, but we've already got 4. Metalink, manuals, training guides, Google, etc. seem to suggest two answers.                 

  1. One db writer for each database disk - in our case that would be 8
  2. CPUs/8 adjusted for multiples of CPU groups - in our case that would be 4

                Any thoughts?                                                             

	-- 
	Regards,
	
	Greg Rahn
	http://structureddata.org 



--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 27 2008 - 16:59:05 CST

Original text of this message