Oracle-L: RE: log file sync Wait

From: VIVEK_SHARMA <VIVEK_SHARMA_at_infosys.com>
Date: Thu, 02 Jan 2003 07:44:43 -0800
Message-ID: <F001.005257B5.20030102074443@fatcity.com>

REPLIES TO YOUR QUESTIONS ARE IN CAPITALS BELOW :- THANKS -----Original Message-----
Sent: Thursday, January 02, 2003 3:59 PM To: Multiple recipients of list ORACLE-L

Usual caveat:

looking a v$system_event can be very misleading, you need to examine v$session_event to determine if anyone is actually noticing a problem.

Usual caveat 2:

A statspack report without a time interval is almost meaningless. However, in this case, log file sync at the top is sufficiently unusual to warrant a little hypothesis.

SOME STATSPACK OUTPUTS :-

                Snap Id     Snap Time      Sessions
                ------- ------------------ --------
Begin Snap:        124 31-Dec-02 12:53:00    1,237
  End Snap:        133 31-Dec-02 19:12:19    1,237
    Elapsed:                 379.32 (mins)

Cache Sizes

           db_block_buffers:     200000          log_buffer:    2097152
              db_block_size:       8192    shared_pool_size:  157286400

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:            118,042.27              2,007.50

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %: 99.99 Redo NoWait %: 100.00

Wait Events for DB: NCB Instance: ncb Snaps: 124 -133
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Event                               Waits   Timeouts  Time (cs)    (ms)   /txn
---------------------------- ------------ ---------- ----------- ------ ------
log file sync                     970,563        537   2,597,831     27    0.7
log file parallel write           831,141         23     484,948      6    0.6
db file sequential read         8,310,890          0     416,355      1    6.2
log file switch completion             98          0       1,712    175    0.0
log buffer space                      124          0       1,551    125    0.0
LGWR wait for redo copy             4,785         22         299      1    0.0
log file single write                 123          0          49      4    0.0

Background Wait Events for DB: NCB Instance: ncb Snaps: 124 -133
-> ordered by wait time desc, waits desc (idle events last)

                                                                    Avg
                                                     Total Wait    wait  Waits
Event                               Waits   Timeouts  Time (cs)    (ms)   /txn
---------------------------- ------------ ---------- ----------- ------ ------
log file parallel write           831,123         23     484,945      6    0.6
log file sequential read           89,714          0       3,418      0    0.1

Instance Activity Stats for DB: NCB Instance: ncb Snaps: 124 -133

Statistic                                    Total   per Second    per Trans
--------------------------------- ---------------- ------------ ------------
redo blocks written                      5,875,674        258.2          4.4
redo buffer allocation retries                 218          0.0          0.0
redo entries                             6,531,597        287.0          4.9
redo log space requests                         98          0.0          0.0
redo log space wait time                     1,712          0.1          0.0
redo ordering marks                              0          0.0          0.0
redo size                            2,686,523,912    118,042.3      2,007.5
redo synch time                          2,602,866        114.4          1.9
redo synch writes                          957,313         42.1          0.7
redo wastage                           227,405,300      9,991.9        169.9
redo write time                          1,185,661         52.1          0.9
redo writer latching time                      304          0.0          0.0
redo writes                                831,047         36.5          0.6


Latch Activity for DB: NCB  Instance: ncb  Snaps: 124 -133
                                                Pct    Avg                 Pct
                                   Get          Get   Slps       NoWait NoWait
Latch Name                       Requests      Miss  /Miss     Requests   Miss
----------------------------- -------------- ------ ------ ------------ ------
redo allocation                    8,204,689    0.1    0.1            0
redo writing                       4,932,177    0.8    0.0            0

Latch Miss Sources for DB: NCB Instance: ncb Snaps: 124 -133
-> only latches with sleeps are shown
-> ordered by name, sleeps desc

                                                     NoWait             Waiter
Latch Name               Where                       Misses     Sleeps  Sleeps
------------------------ -------------------------- ------- ---------- -------
redo allocation          kcrfwr: redo allocation          0        469     510
redo allocation          kcrfwi: before write             0         78      31
redo allocation          kcrfwi: more space               0         20      26
redo writing             kcrfsr                           0        970      72
redo writing             kcrfss                           0        209   1,242
redo writing             kcrfwi: after write              0        170      19
redo writing             kcrfwcr                          0         16      32

redo writing Waiter Sleeps Value = 1,242 IS LARGER THAN THAT OF OTHER EVENTS

Question: Was log file write really number two, or have you knocked out one or two lines between the two log-related waits ?

"log file parallel write" IS INDEED THE SECOND WAIT AFTER "Log file sync" IN STATSPACK

Log file syncs are from the sessions,
log file writes are from LGWR

A log file sync is a call from a session to lgwr to write some log buffer to disc. As such, you could get multiple sessions calling at about the same time - and only the first one in gets lgwr to write, the rest have to wait until lgwr returns and notices that there is now a queue and does a piggyback write.

Consequently, it is possible on a highly concurrent system for log file sync to have far more WAITS then log file write, and therefore look a much bigger problem than it really is.

However, in your case, the number of log file sync WAITS is about the same as the number of log file write WAITS - so the fact that the TIME is five times as long suggests that concurrency of waits is not the issue, and you may have a proper problem.

I suspect that the problem is the number of processes running on your system. Session A issues a log file sync, and goes off the run queue; some time later, lgwr gets the message and writes and posts session A to allow it to go back on the run queue. Session A sits on the run queue for ages, and finally becomes runnable. Solution - look at MTS, or get more CPUs on the box.

IS THERE ANY WAY TO INCREASE THE NUMBER OF LGWR PROCESSES IN A SINGLE INSTANCE DATABASE ? But having said that - do check if any sessions are actually noticing a significant loss of time due to log file sync before worry about it.

Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

-----Original Message-----
To: Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com> Date: 02 January 2003 07:48

>
>What ALL may be Done to Address the Following ?
>Any /etc/system , init.ora parameter Changes too ?
>Moving the Online Redo Logfiles onto RAID 1 NOT possible as that may

warrant Additional Hardware . Moreover T3+ does NOT Support RAID 1 (Only RAID 1+ )

>
>
>Concurrent Oracle processes = 1500 Approx.
>Statspack Taken during Mostly OLTP Operations :-
>
>Top 5 Wait Events
>~~~~~~~~~~~~~~~~~                                             Wait
% Total
>Event                                               Waits  Time (cs)

Wt Time

>-------------------------------------------- ------------ -----------
- -------
>log file sync                                     970,563
2,597,831   57.46
>log file parallel write                           831,141

484,948 10.73

>
>log_buffer = 2MB
>Online Redo Logfiles Exist on RAID 1+
>Storage Box is T3+
>File System = UFS
>
>Application = Banking (Hybrid )
>Oracle 8.1.7.4
>Solaris 8
>Machine Box = SF6800
>

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: VIVEK_SHARMA
  INET: VIVEK_SHARMA_at_infosys.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Thu Jan 02 2003 - 09:44:43 CST