RE: DB RAID Setup

From: Baumgartel, Paul <paul.baumgartel_at_credit-suisse.com>
Date: Wed, 10 Dec 2008 09:36:46 -0500
Message-ID: <21469B88E0EA11498818517F210335310455BA48@EPRI17P32001A.csfb.cs-group.com>


Also, if your log buffer is too large, the automatic log buffer flushes performed when the buffer is 1/3 full may not be happening frequently enough. WIth a high commit rate, most of your log writes will be performed on commit, causing excessive waits. The buffer should be just large enough to minimize log buffer space waits; usually it should be 4 MB or less.  

Paul Baumgartel
CREDIT SUISSE
Information Technology
Prime Services Databases Americas
One Madison Avenue
New York, NY 10010
USA
Phone 212.538.1143
paul.baumgartel_at_credit-suisse.com
www.credit-suisse.com  


From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tim Gorman Sent: Tuesday, December 09, 2008 5:50 PM To: martinfbrown_at_hotmail.com
Cc: rob.dempsey_at_5one.co.uk; oracle-l_at_freelists.org Subject: Re: DB RAID Setup

Martin,

The event "log file sync" is usually caused by two things:

  • LGWR is operating synchonously upon COMMIT (i.e. LGWR too slow, probably waiting excessively on event "log file parallel write")
  • LGWR is being continuously interrupted by too-frequent COMMITs

The first is the situation where you want faster disk or even SSD. The second is an application issue that you'll have to take up with the folks who designed, built, or use the application supported by the database.

With regards to the potential cause being slow I/O under the online redo log files...

Your "architect" has multiplexed the online redo log files to RAID1 volumes as well as RAID5 volumes. Clearly, the lowest common denominator is at work here -- that is, the performance of writes to the online redo logfiles is determined by the slowest write. So, in multiplexing between a faster RAID1 volume and a slower RAID5 volume, the overall write performance will be constrained by the slower RAID5 volume.

This is somewhat nonsensical -- why bother basing any online redo log files on RAID1 at all? The reason for RAID1 mirroring is redundancy without loss of read/write performance, and the reason for RAID5 is redundancy to maximize usable space but at a cost to write performance. Space is being wasted on the RAID1 volume because the better performance it provides is not being used. You might as well have two RAID5 volumes for multiplexing those online redo log files; it wouldn't hurt performance any further, and you'd recover some usable disk space, if you needed it.

My advice is: first determine if the event "log file parallel write" comprises a significant amount of total "DB time" (i.e. 10% or more)? Is it in the "Top 5 Timed Events" list in either STATSPACK or AWR? Because if I/O performance isn't causing this problem, then there is no sense discussing the I/O subsystem further. Instead, get to know the application code through examination of SQL traces and V$SESSION_EVENT, then approach the application developers/architects/vendors with specific findings about too-small transactions or excessive commit commands.

If "log file parallel write" is significant however, then I'd suggest to cease putting online redo log files to the RAID5 volume, as a test. You can issue ALTER DATABASE statements to drop those log file members on the RAID5 volume, or you can add additional log file members on the RAID1 volumes and then drop the log file members on RAID5. Either way, these actions can be performed while the database is up, fully available, and nobody will know (or need to know) that the change is occurring. Manipulating online redo log files is definitely an ONLINE activity...

That's just my opinion, your mileage may vary...

Tim Gorman
consultant - Evergreen Database Technologies, Inc. P.O. Box 630791, Highlands Ranch CO 80163-0791

website   = http://www.EvDBT.com/
email     = Tim_at_EvDBT.com
mobile    = +1-303-885-4526
fax       = +1-303-484-3608
Yahoo IM  = tim_evdbt


Martin Brown wrote:

        I have a similar question regarding raid 5 and what I perceive to be a "potential" problem. I am experiencing a high number of logfile sync waits that appear most afternoons as the daily load starts to increase. We're running 10.2.0.3 RAC with a 8 node cluster. We use a 3par storage system for the database. My architect has configured both raid 1 and raid 5 diskgroups and he duplexes the online redo logs and controlfiles to raid5.          

        Now, he insists that the logfile sync waits are not related in any way to the raid5 configuration and the 3par storage solution is state-of-the-art. But the only remedy I can find is to move to faster SSD. Is it possible that the high afternoon activity coupled with the raid5 configuration is the root of my problem?          

        I've looked at other stuff too. The log switches are 2 - 3 an hour. The log buffer is never full. Anyone have any other suggestions as to where to look?                           


	Subject: DB RAID Setup
	Date: Tue, 18 Nov 2008 17:06:20 +0000
	From: Rob.Dempsey_at_5one.co.uk
	To: oracle-l_at_freelists.org
	
	
	

	Hi

	 

	Oracle 10g2 (data warehouse)

	 

	I thought I would ask peoples' thoughts on the following.

	 

	I have setup our database whereby the index tablespace and data
tablespace are separate. This is not for performance reason only for ease of maintenance.          

        We are being advised by the SAN provider to use the following RAID layout          

	                Archive redo Logs            - RAID 10

	Redo Logs                           - RAID 10

	Temp tablespace             - RAID 10

	Undo tablespace              - RAID 10

	Index tablespaces           - RAID 10

	System tablespace          - RAID 5

	Data tablespaces              - RAID 5

	 

	Redo logs / Temp tablespace I agree with. 

	 

	To use RAID 5 for data, I understand there is a write
performance hit but this is a data warehouse so should be ok (Ideally I would like that RAID 10 as well). But to have the index tablespace on RAID 10 and data tablespace on RAID 5 I found that strange. When I asked the reason why I was give the response 'that is what Oracle recommends'.          

        Has anyone heard this before?          

        Rob                                                  


        You live life online. So we put Windows on the web. Learn more about Windows Live <http://clk.atdmt.com/MRT/go/127032869/direct/01/>


Please access the attached hyperlink for an important electronic communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html


--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 10 2008 - 08:36:46 CST

Original text of this message