Re: Deadlock ITL Waits

From: Stalin <stalinsk_at_gmail.com>
Date: Mon, 25 Jul 2011 14:10:11 -0700
Message-ID: <CA+FfP7hLw+Q0UJYoc=Rd2nTgMsheppm0JTgWECL7dr6-H9oAvg_at_mail.gmail.com>



It is 15K RPM, 300G drives.

Thanks Harel for the pointers. I will report back when i hear from storage vendor.

On Mon, Jul 25, 2011 at 12:20 PM, Harel Safra <harel.safra_at_gmail.com> wrote:

> Stalin,
> You haven't specified if the drives are 15k or 10k RPM or the size and
> configuration of the SAN cache, so lets assume 15k, write through cache and
> do some back of the napkin calculations:
> As a rule of thumb a 15k RPM SAS drive can do about 180 IOPS. Since you
> have 22 drives in your array the whole array can do 180*22=3960 IOPS, lets
> call that 4000 IOPS.
> Your array is RAID 1+0 so every database write IO means twice the write IO
> on the drives, so your 1769 writes/s mean ~3500 IOPS to the array. Add the
> ~250 reads/s and you're indeed getting real close to the limit of the array.
> Even if the SAN is writing to cache only, if you're sustaining ~1750 w/s
> the cache quite possibly won't be able to be flushed fast enough.
>
> Grill your storage vendor, they should have the metrics to test if the
> array is reaching its limits.
>
> Harel Safra
>
>
> On Mon, Jul 25, 2011 at 8:34 PM, Stalin <stalinsk_at_gmail.com> wrote:
>
>> Well this is a T5220 Cool thread server, apparently good for OLTP type
>> applications but not good for batch or warehouse type application, unless
>> you use parallel query options.
>>
>> I had got the IOstat numbers during the slowness period, which seems
>> little puzzling to me.
>>
>> extended device statistics
>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
>> 253.7 1769.0 2048.6 15844.8 222.5 253.2 110.0 125.2 94 100 /data
>>
>> With 16MB/s writes, we are seeing service time of 125ms. And also looking
>> the wait time in the Queue, seems like pushing the array to its limits,
>> which i can't believe. Is this normal for an array with 22 disks in Raid 1+0
>> (300G SAS drives, FC attached, SAN storagetek 2540). We have a ticket
>> opened with Sun/Oracle, but no progress made thus far.
>>
>> We had a bad drive, however spare kicked in, scheduled for replacement.
>> And no errors seen in the path to the array. Any clues what might be
>> happening.
>>
>> On Thu, Jul 21, 2011 at 8:47 PM, Chitale, Hemant Krishnarao <
>> Hemant.Chitale_at_sc.com> wrote:
>>
>>>
>>> This seems to be similar to this thread :
>>> http://forums.oracle.com/forums/thread.jspa?threadID=2256521&tstart=0
>>>
>>>
>>> 1.4million commits and 1.4million 'log file sync' waits of 3seconds each
>>> ?!!!
>>>
>>>
>>> Given that you have reported (from another email)
>>>
>>> Event Waits <1ms <2ms <4ms <8ms <16ms <32ms
>>> <=1s >1s
>>> -------------------------- ----- ----- ----- ----- ----- ----- -----
>>> ----- -----
>>> log file parallel write 38K 72.5 15.4 5.4 2.0 .8 .4
>>> 1.3 2.2
>>> log file sync 838K 2.9 1.0 .5 1.7 1.7 .8
>>> 7.6 83.8
>>>
>>> I would guess that are are certain very very large spikes in I/O response
>>> times (or that there's a bug in the timed_statistics)
>>>
>>> (A 64 CPU install without the Diagnostic Pack licence ?)
>>>
>>>
>>> Hemant K Chitale
>>>
>>> ________________________________________
>>> From: oracle-l-bounce_at_freelists.org [mailto:
>>> oracle-l-bounce_at_freelists.org] On Behalf Of Stalin
>>> Sent: Thursday, July 21, 2011 6:37 AM
>>> To: oracle-l
>>> Subject: Deadlock ITL Waits
>>>
>>> We have been seeing lots of deadlock errors lately in load testing
>>> environments and they all have been due to enq: TX - allocate ITL entry. In
>>> reviewing the statspack report for the periods of deadlock, i see that, log
>>> file sync wait being the top consumer with a terrible wait time. That makes
>>> to me think the deadlock, is just a symptom of high log file sync wait
>>> times. Below is the snippet from statspack and looking at these numbers,
>>> especially CPU not being heavily loaded, wondering if this could be a case
>>> of storage issue. Sys Admins are checking the storage layer but thought
>>> would check here get any opinions/feedback.
>>>
>>> Top 5 Timed Events Avg
>>> %Total
>>> ~~~~~~~~~~~~~~~~~~ wait
>>> Call
>>> Event Waits Time (s) (ms)
>>> Time
>>> ----------------------------------------- ------------ ----------- ------
>>> ------
>>> log file sync 1,400,773 4,357,902
>>> 3111 91.4
>>> db file sequential read 457,568 334,834 732
>>> 7.0
>>> db file parallel write 565,843 27,573 49
>>> .6
>>> read by other session 16,168 7,395 457
>>> .2
>>> enq: TX - allocate ITL entry 575 6,854 11919
>>> .1
>>> -------------------------------------------------------------
>>> Host CPU (CPUs: 64 Cores: 8 Sockets: 1)
>>> ~~~~~~~~ Load Average
>>> Begin End User System Idle WIO
>>> WCPU
>>> ------- ------- ------- ------- ------- -------
>>> --------
>>> 3.13 7.04 2.26 3.30 94.44 0.00
>>> 7.81
>>>
>>> Statistic Total per Second
>>> per Trans
>>> --------------------------------- ------------------ --------------
>>> ------------
>>> redo synch time 435,852,302 120,969.3
>>> 309.7
>>> redo synch writes 1,400,807 388.8
>>> 1.0
>>> redo wastage 5,128,804 1,423.5
>>> 3.6
>>> redo write time 357,414 99.2
>>> 0.3
>>> redo writes 9,935 2.8
>>> 0.0
>>> user commits 1,400,619 388.7
>>> 1.0
>>>
>>>
>>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>>
>>> Thanks,
>>> Stalin
>>>
>>> This email and any attachments are confidential and may also be
>>> privileged. If you are not the addressee, do not disclose, copy, circulate
>>> or in any other way use or rely on the information contained in this email
>>> or any attachments. If received in error, notify the sender immediately and
>>> delete this email and any attachments from your system. Emails cannot be
>>> guaranteed to be secure or error free as the message and any attachments
>>> could be intercepted, corrupted, lost, delayed, incomplete or amended.
>>> Standard Chartered PLC and its subsidiaries do not accept liability for
>>> damage caused by this email or any attachments and may monitor email
>>> traffic.
>>>
>>> Standard Chartered PLC is incorporated in England with limited liability
>>> under company number 966425 and has its registered office at 1 Aldermanbury
>>> Square, London, EC2V 7SB.
>>>
>>> Standard Chartered Bank ("SCB") is incorporated in England with limited
>>> liability by Royal Charter 1853, under reference ZC18. The Principal Office
>>> of SCB is situated in England at 1 Aldermanbury Square, London EC2V 7SB. In
>>> the United Kingdom, SCB is authorised and regulated by the Financial
>>> Services Authority under FSA register number 114276.
>>>
>>> If you are receiving this email from SCB outside the UK, please click
>>> http://www.standardchartered.com/global/email_disclaimer.html to refer
>>> to the information on other jurisdictions.
>>>
>>
>>
>>
>> --
>> Thanks,
>>
>> Stalin
>>
>
>

-- 
Thanks,

Stalin

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 25 2011 - 16:10:11 CDT

Original text of this message