Re: Deadlock ITL Waits

From: kyle Hailey <kylelf_at_gmail.com>
Date: Thu, 21 Jul 2011 08:23:01 -0700
Message-ID: <CADsdiQgkfHLVbBeqFSPFH29qqN9Ar6ExH4gxfpG7EHVpsgY_-Q_at_mail.gmail.com>



At the simplest you can just run a statspack report and search for "log file parallel write" . It will be in the report as well.

Well the metric tables are not part of perf pack, so this should be fine:

Latencies in the past minute

col name for a25

select m.intsize_csec,
       n.name ,
       m.time_waited,
       m.wait_count,
       10*m.time_waited/nullif(m.wait_count,0) avgms
from v$eventmetric m,
     v$event_name n

where m.event_id=n.event_id
  and n.name in (
                  'log file sync',
                  'log file parallel write'
);

but alas that's the last minute. On the other hand if you have statspack you can query the statspack tables in a similar way to the AWR querying based on

       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s

Could you give this a try on statspack and post the query?

On Wed, Jul 20, 2011 at 11:50 PM, Stalin <stalinsk_at_gmail.com> wrote:

> Thanks Kyle.
>
> Apparently we don't have diagnostic pack licenses to use DBA_HIST tables.
> Is this something i can get from statspack tables ?
>
>
> On Wed, Jul 20, 2011 at 10:20 PM, kyle Hailey <kylelf_at_gmail.com> wrote:
>
>>
>> Yeah, the massive log waits could exacerbate the ITL waits as things will
>> be slow commiting their changes.
>> The db file sequential reads are huge as well. Looks like you have a
>> problem on your storage.
>>
>> Check the log write times ( I blogged some of this at
>> http://dboptimizer.com/2011/07/20/wait-event-and-wait-class-metrics-vs-vsystem_event/)
>>
>>
>> Latencies in the past minute
>>
>> col name for a25
>> select m.intsize_csec,
>> n.name ,
>> m.time_waited,
>> m.wait_count,
>> 10*m.time_waited/nullif(m.wait_count,0) avgms
>> from v$eventmetric m,
>> v$event_name n
>> where m.event_id=n.event_id
>> and n.name in (
>> 'log file sync',
>> 'log file parallel write'
>> );
>>
>>
>> Latencies averaged over each hour
>>
>> select
>> btime,
>> (time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0) avg_ms
>> from (
>> select
>> to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime,
>> total_waits count_end,
>> time_waited_micro/1000 time_ms_end,
>> Lag (e.time_waited_micro/1000)
>> OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
>> Lag (e.total_waits)
>> OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
>> from
>> DBA_HIST_SYSTEM_EVENT e,
>> DBA_HIST_SNAPSHOT s
>> where
>> s.snap_id=e.snap_id
>> and e.event_name in (
>> 'log file sync',
>> 'log file parallel write'
>> )
>> order by begin_interval_time
>> )
>> order by btime
>> /
>>
>> If the log file parallel writes are bad then probably something wrong on
>> the storage system.
>>
>> - Kyle Hailey
>> http://dboptimizer.com
>>
>>
>>
>> On Wed, Jul 20, 2011 at 3:37 PM, Stalin <stalinsk_at_gmail.com> wrote:
>>
>>> We have been seeing lots of deadlock errors lately in load testing
>>> environments and they all have been due to enq: TX - allocate ITL entry. In
>>> reviewing the statspack report for the periods of deadlock, i see that, log
>>> file sync wait being the top consumer with a terrible wait time. That makes
>>> to me think the deadlock, is just a symptom of high log file sync wait
>>> times. Below is the snippet from statspack and looking at these numbers,
>>> especially CPU not being heavily loaded, wondering if this could be a case
>>> of storage issue. Sys Admins are checking the storage layer but thought
>>> would check here get any opinions/feedback.
>>>
>>> Top 5 Timed Events Avg
>>> %Total
>>> ~~~~~~~~~~~~~~~~~~ wait
>>> Call
>>> Event Waits Time (s) (ms)
>>> Time
>>> ----------------------------------------- ------------ ----------- ------
>>> ------
>>> log file sync 1,400,773 4,357,902
>>> 3111 91.4
>>> db file sequential read 457,568 334,834 732
>>> 7.0
>>> db file parallel write 565,843 27,573 49
>>> .6
>>> read by other session 16,168 7,395 457
>>> .2
>>> enq: TX - allocate ITL entry 575 6,854 11919
>>> .1
>>> -------------------------------------------------------------
>>> Host CPU (CPUs: 64 Cores: 8 Sockets: 1)
>>> ~~~~~~~~ Load Average
>>> Begin End User System Idle WIO
>>> WCPU
>>> ------- ------- ------- ------- ------- -------
>>> --------
>>> 3.13 7.04 2.26 3.30 94.44 0.00
>>> 7.81
>>>
>>> Statistic Total per Second
>>> per Trans
>>> --------------------------------- ------------------ --------------
>>> ------------
>>> redo synch time 435,852,302 120,969.3
>>> 309.7
>>> redo synch writes 1,400,807 388.8
>>> 1.0
>>> redo wastage 5,128,804 1,423.5
>>> 3.6
>>> redo write time 357,414 99.2
>>> 0.3
>>> redo writes 9,935 2.8
>>> 0.0
>>> user commits 1,400,619 388.7
>>> 1.0
>>>
>>>
>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>>
>>> Thanks,
>>> Stalin
>>>
>>>
>>
>
>
> --
> Thanks,
>
> Stalin
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 21 2011 - 10:23:01 CDT

Original text of this message