Re: Deadlock ITL Waits

From: Stalin <stalinsk_at_gmail.com>
Date: Tue, 2 Aug 2011 14:08:16 -0700
Message-ID: <CA+FfP7hVJo=7vs92ggdE-9LF3S+7H+227x=fv9OQC+C8=LyuBA_at_mail.gmail.com>



Cc'ing list now.

On Tue, Aug 2, 2011 at 2:07 PM, Stalin <stalinsk_at_gmail.com> wrote:

> Apparently we had an issue with controller/array. Oracle finally agreed to
> the problem and provided the replacement.
>
>
> On Mon, Jul 25, 2011 at 2:10 PM, Stalin <stalinsk_at_gmail.com> wrote:
>
>> It is 15K RPM, 300G drives.
>>
>> Thanks Harel for the pointers. I will report back when i hear from storage
>> vendor.
>>
>>
>> On Mon, Jul 25, 2011 at 12:20 PM, Harel Safra <harel.safra_at_gmail.com>wrote:
>>
>>> Stalin,
>>> You haven't specified if the drives are 15k or 10k RPM or the size and
>>> configuration of the SAN cache, so lets assume 15k, write through cache and
>>> do some back of the napkin calculations:
>>> As a rule of thumb a 15k RPM SAS drive can do about 180 IOPS. Since you
>>> have 22 drives in your array the whole array can do 180*22=3960 IOPS, lets
>>> call that 4000 IOPS.
>>> Your array is RAID 1+0 so every database write IO means twice the write
>>> IO on the drives, so your 1769 writes/s mean ~3500 IOPS to the array. Add
>>> the ~250 reads/s and you're indeed getting real close to the limit of the
>>> array.
>>> Even if the SAN is writing to cache only, if you're sustaining ~1750 w/s
>>> the cache quite possibly won't be able to be flushed fast enough.
>>>
>>> Grill your storage vendor, they should have the metrics to test if the
>>> array is reaching its limits.
>>>
>>> Harel Safra
>>>
>>>
>>> On Mon, Jul 25, 2011 at 8:34 PM, Stalin <stalinsk_at_gmail.com> wrote:
>>>
>>>> Well this is a T5220 Cool thread server, apparently good for OLTP type
>>>> applications but not good for batch or warehouse type application, unless
>>>> you use parallel query options.
>>>>
>>>> I had got the IOstat numbers during the slowness period, which seems
>>>> little puzzling to me.
>>>>
>>>> extended device statistics
>>>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
>>>> 253.7 1769.0 2048.6 15844.8 222.5 253.2 110.0 125.2 94 100 /data
>>>>
>>>> With 16MB/s writes, we are seeing service time of 125ms. And also
>>>> looking the wait time in the Queue, seems like pushing the array to its
>>>> limits, which i can't believe. Is this normal for an array with 22 disks in
>>>> Raid 1+0 (300G SAS drives, FC attached, SAN storagetek 2540). We have a
>>>> ticket opened with Sun/Oracle, but no progress made thus far.
>>>>
>>>> We had a bad drive, however spare kicked in, scheduled for replacement.
>>>> And no errors seen in the path to the array. Any clues what might be
>>>> happening.
>>>>
>>>> On Thu, Jul 21, 2011 at 8:47 PM, Chitale, Hemant Krishnarao <
>>>> Hemant.Chitale_at_sc.com> wrote:
>>>>
>>>>>
>>>>> This seems to be similar to this thread :
>>>>> http://forums.oracle.com/forums/thread.jspa?threadID=2256521&tstart=0
>>>>>
>>>>>
>>>>> 1.4million commits and 1.4million 'log file sync' waits of 3seconds
>>>>> each ?!!!
>>>>>
>>>>>
>>>>> Given that you have reported (from another email)
>>>>>
>>>>> Event Waits <1ms <2ms <4ms <8ms <16ms <32ms
>>>>> <=1s >1s
>>>>> -------------------------- ----- ----- ----- ----- ----- ----- -----
>>>>> ----- -----
>>>>> log file parallel write 38K 72.5 15.4 5.4 2.0 .8 .4
>>>>> 1.3 2.2
>>>>> log file sync 838K 2.9 1.0 .5 1.7 1.7 .8
>>>>> 7.6 83.8
>>>>>
>>>>> I would guess that are are certain very very large spikes in I/O
>>>>> response times (or that there's a bug in the timed_statistics)
>>>>>
>>>>> (A 64 CPU install without the Diagnostic Pack licence ?)
>>>>>
>>>>>
>>>>> Hemant K Chitale
>>>>>
>>>>> ________________________________________
>>>>> From: oracle-l-bounce_at_freelists.org [mailto:
>>>>> oracle-l-bounce_at_freelists.org] On Behalf Of Stalin
>>>>> Sent: Thursday, July 21, 2011 6:37 AM
>>>>> To: oracle-l
>>>>> Subject: Deadlock ITL Waits
>>>>>
>>>>> We have been seeing lots of deadlock errors lately in load testing
>>>>> environments and they all have been due to enq: TX - allocate ITL entry. In
>>>>> reviewing the statspack report for the periods of deadlock, i see that, log
>>>>> file sync wait being the top consumer with a terrible wait time. That makes
>>>>> to me think the deadlock, is just a symptom of high log file sync wait
>>>>> times. Below is the snippet from statspack and looking at these numbers,
>>>>> especially CPU not being heavily loaded, wondering if this could be a case
>>>>> of storage issue. Sys Admins are checking the storage layer but thought
>>>>> would check here get any opinions/feedback.
>>>>>
>>>>> Top 5 Timed Events
>>>>> Avg %Total
>>>>> ~~~~~~~~~~~~~~~~~~
>>>>> wait Call
>>>>> Event Waits Time (s)
>>>>> (ms) Time
>>>>> ----------------------------------------- ------------ -----------
>>>>> ------ ------
>>>>> log file sync 1,400,773 4,357,902
>>>>> 3111 91.4
>>>>> db file sequential read 457,568 334,834
>>>>> 732 7.0
>>>>> db file parallel write 565,843 27,573
>>>>> 49 .6
>>>>> read by other session 16,168 7,395
>>>>> 457 .2
>>>>> enq: TX - allocate ITL entry 575 6,854
>>>>> 11919 .1
>>>>> -------------------------------------------------------------
>>>>> Host CPU (CPUs: 64 Cores: 8 Sockets: 1)
>>>>> ~~~~~~~~ Load Average
>>>>> Begin End User System Idle WIO
>>>>> WCPU
>>>>> ------- ------- ------- ------- ------- -------
>>>>> --------
>>>>> 3.13 7.04 2.26 3.30 94.44 0.00
>>>>> 7.81
>>>>>
>>>>> Statistic Total per Second
>>>>> per Trans
>>>>> --------------------------------- ------------------ --------------
>>>>> ------------
>>>>> redo synch time 435,852,302 120,969.3
>>>>> 309.7
>>>>> redo synch writes 1,400,807 388.8
>>>>> 1.0
>>>>> redo wastage 5,128,804 1,423.5
>>>>> 3.6
>>>>> redo write time 357,414 99.2
>>>>> 0.3
>>>>> redo writes 9,935 2.8
>>>>> 0.0
>>>>> user commits 1,400,619 388.7
>>>>> 1.0
>>>>>
>>>>>
>>>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>>>>
>>>>> Thanks,
>>>>> Stalin
>>>>>
>>>>> This email and any attachments are confidential and may also be
>>>>> privileged. If you are not the addressee, do not disclose, copy, circulate
>>>>> or in any other way use or rely on the information contained in this email
>>>>> or any attachments. If received in error, notify the sender immediately and
>>>>> delete this email and any attachments from your system. Emails cannot be
>>>>> guaranteed to be secure or error free as the message and any attachments
>>>>> could be intercepted, corrupted, lost, delayed, incomplete or amended.
>>>>> Standard Chartered PLC and its subsidiaries do not accept liability for
>>>>> damage caused by this email or any attachments and may monitor email
>>>>> traffic.
>>>>>
>>>>> Standard Chartered PLC is incorporated in England with limited
>>>>> liability under company number 966425 and has its registered office at 1
>>>>> Aldermanbury Square, London, EC2V 7SB.
>>>>>
>>>>> Standard Chartered Bank ("SCB") is incorporated in England with limited
>>>>> liability by Royal Charter 1853, under reference ZC18. The Principal Office
>>>>> of SCB is situated in England at 1 Aldermanbury Square, London EC2V 7SB. In
>>>>> the United Kingdom, SCB is authorised and regulated by the Financial
>>>>> Services Authority under FSA register number 114276.
>>>>>
>>>>> If you are receiving this email from SCB outside the UK, please click
>>>>> http://www.standardchartered.com/global/email_disclaimer.html to refer
>>>>> to the information on other jurisdictions.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>>
>>>> Stalin
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>>
>> Stalin
>>
>
>
>
> --
> Thanks,
>
> Stalin
>

-- 
Thanks,

Stalin

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Aug 02 2011 - 16:08:16 CDT

Original text of this message