Re: High "log file sync" Waits in Statspack, - Online Redo Logs on Mounted File system of SAN Storage

From: Greg Rahn <greg_at_structureddata.org>
Date: Mon, 14 Jan 2008 08:09:15 -0800
Message-ID: <a9c093440801140809n28831e75o9eea837399b70eeb@mail.gmail.com>


Having the sys cpu time be more than the user is a red flag to me. Have you investigated this? Using dtrace would be a good start. Run this command for 30 seconds or so while the sys time is high, then ctrl-c out of it.
dtrace -n 'syscall::: {@[execname,probefunc] = count ();}'

Check out Brendon Gregg's page for more good dtrace stuff http://www.brendangregg.com/dtrace.html#OneLiners

I think you could make some /etc/system changes. You have vol_maxio=2048 (1MB) but maxphys=131072. These don't match up well. I would recommend these settings:
set tune_t_fsflushr=60
set autoup=900
set maxphys = 2097152
set md_maxphys = 2097152

Just some other thoughts...
- you could consider creating a processor set of one CPU and assigning the lgwr to it if you feel it is not getting on the CPU - I'm not certain that you've demonstrated there is any lgwr IO issue.  You are suggesting solutions, but I've not seen any evidence what is the problem, other than some one liners from statspack. What does iostat show? Does the array software show any high IO times? Do you have a lazy lgwr or is it constantly pumping out IOs? How large are the IOs? What are the response times for those IOs? Does it even matter how many physical drives there are on the LUN for redo when there is 160GB of cache on the array? All these are questions that probably should be researched before making any significant changes. - the last thing I would do is go making changes to stuff where you don't have any evidence there is a problem. That is probably the least effective way to troubleshoot an issue. A systematic approach will always prevail.

Hope this helps.

I'd be interesting in an entire statspack report, because I don't feel enough information is given to make a good diagnosis.

On 1/12/08, VIVEK_SHARMA <VIVEK_SHARMA_at_infosys.com> wrote:
> During a Benchmark Run of OLTP Transactions very High "log file sync" wait is occuring (Statspack info below)
>
>
>
> Top 5 Timed Events Avg %Total
>
> ~~~~~~~~~~~~~~~~~~ wait Call
>
> Event Waits Time (s) (ms) Time
>
> ----------------------------------------- ------------ ----------- ------ ------
>
> log file sync 2,208,357 164,021 74 59.4
>
>
>
> CPU usage on DB Server is only 50 % - %sys = 30 %, % usr = 20 %
>
> Thus CPUs are not Choking & hopefully not cauing "log file sync wait"
>
>
>
> To identify if the cause of "log file sync" is an IO or CPU bottleneck, truss command was issued on LGWR to find the O.S. function call taking the Longest Time
>
>
>
> $ truss -fdD -rall -wall -o truss_lgwr1.log -p <PID of ora_lgwr_$SID>
> Shows the Longest wait on the following 2 Function Calls:-
> 12549/13: 3.2242 3.1328 kaio(AIONOTIFY, 0) = 0
> 12549/1: 1.6162 1.4777 kaio(AIOWAIT, 0xFFFFFFFF7FFFD860) = 1
>
> Qs Does this output mean that it is an KAIO issue?What corrective action is advisable? NOTE - In init$SID.ora, disk_async_io=TRUE
> # adb -k
> physmem 2584cbf
> maxphys/D
> maxphys:
> maxphys: 131072
> vol_maxio/D
> vol_maxio:
> vol_maxio: 2048
> $q
> Qs On the Storage Box, should a few of its HBA Controllers & respective Ports be assigned dedicatedly only to the online Redo LUN / Volume?
> Qs Should Online Redo Logiles be moved to RAW Devices on the SAN Storage?
> Qs Any advisable init.ora parameters to set e.g. _log_parallelism=4, _log_simultaneous_copies=256?
> Qs Should LGWR process be set to a Higher CPU priority(using renice) since 60% of the CPU power is FREE/Unused?
>
> Will share Statspack, truss Outputs as needed
>
> Cheers & Thanks
>
> P.S.
>
>
>
> NOTE - cpu_count=80
>
> Storage Box - Sun StoreEdge 9990V (Hitachi SAN)
>
> Online Redo logfiles exist separately on a Mounted Filesystem with an underlying LUN / Volume of 4+4 Hard Disks , (RAID 1+0 Type)
> Storage Cache - 160 GB
> Online Redo Logfile size 750 MB
> log_buffer= 5M
>
> Oracle 10.2.0.3
> Solaris 10
>

-- 
Regards,

Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 14 2008 - 10:09:15 CST

Original text of this message