Re: Deadlock ITL Waits

From: Harel Safra <harel.safra_at_gmail.com>
Date: Mon, 25 Jul 2011 22:20:18 +0300
Message-ID: <CA+UC=5EdsfbsUc5Xjmz_MaVzRfudN9Dv-qeONs2MggrSC-iyqA_at_mail.gmail.com>



Stalin,
You haven't specified if the drives are 15k or 10k RPM or the size and configuration of the SAN cache, so lets assume 15k, write through cache and do some back of the napkin calculations: As a rule of thumb a 15k RPM SAS drive can do about 180 IOPS. Since you have 22 drives in your array the whole array can do 180*22=3960 IOPS, lets call that 4000 IOPS.
Your array is RAID 1+0 so every database write IO means twice the write IO on the drives, so your 1769 writes/s mean ~3500 IOPS to the array. Add the ~250 reads/s and you're indeed getting real close to the limit of the array. Even if the SAN is writing to cache only, if you're sustaining ~1750 w/s the cache quite possibly won't be able to be flushed fast enough.

Grill your storage vendor, they should have the metrics to test if the array is reaching its limits.

Harel Safra

On Mon, Jul 25, 2011 at 8:34 PM, Stalin <stalinsk_at_gmail.com> wrote:

> Well this is a T5220 Cool thread server, apparently good for OLTP type
> applications but not good for batch or warehouse type application, unless
> you use parallel query options.
>
> I had got the IOstat numbers during the slowness period, which seems
> little puzzling to me.
>
> extended device statistics
> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> 253.7 1769.0 2048.6 15844.8 222.5 253.2 110.0 125.2 94 100 /data
>
> With 16MB/s writes, we are seeing service time of 125ms. And also looking
> the wait time in the Queue, seems like pushing the array to its limits,
> which i can't believe. Is this normal for an array with 22 disks in Raid 1+0
> (300G SAS drives, FC attached, SAN storagetek 2540). We have a ticket
> opened with Sun/Oracle, but no progress made thus far.
>
> We had a bad drive, however spare kicked in, scheduled for replacement. And
> no errors seen in the path to the array. Any clues what might be happening.
>
> On Thu, Jul 21, 2011 at 8:47 PM, Chitale, Hemant Krishnarao <
> Hemant.Chitale_at_sc.com> wrote:
>
>>
>> This seems to be similar to this thread :
>> http://forums.oracle.com/forums/thread.jspa?threadID=2256521&tstart=0
>>
>>
>> 1.4million commits and 1.4million 'log file sync' waits of 3seconds each
>> ?!!!
>>
>>
>> Given that you have reported (from another email)
>>
>> Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s
>> >1s
>> -------------------------- ----- ----- ----- ----- ----- ----- ----- -----
>> -----
>> log file parallel write 38K 72.5 15.4 5.4 2.0 .8 .4 1.3
>> 2.2
>> log file sync 838K 2.9 1.0 .5 1.7 1.7 .8 7.6
>> 83.8
>>
>> I would guess that are are certain very very large spikes in I/O response
>> times (or that there's a bug in the timed_statistics)
>>
>> (A 64 CPU install without the Diagnostic Pack licence ?)
>>
>>
>> Hemant K Chitale
>>
>> ________________________________________
>> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
>> On Behalf Of Stalin
>> Sent: Thursday, July 21, 2011 6:37 AM
>> To: oracle-l
>> Subject: Deadlock ITL Waits
>>
>> We have been seeing lots of deadlock errors lately in load testing
>> environments and they all have been due to enq: TX - allocate ITL entry. In
>> reviewing the statspack report for the periods of deadlock, i see that, log
>> file sync wait being the top consumer with a terrible wait time. That makes
>> to me think the deadlock, is just a symptom of high log file sync wait
>> times. Below is the snippet from statspack and looking at these numbers,
>> especially CPU not being heavily loaded, wondering if this could be a case
>> of storage issue. Sys Admins are checking the storage layer but thought
>> would check here get any opinions/feedback.
>>
>> Top 5 Timed Events Avg
>> %Total
>> ~~~~~~~~~~~~~~~~~~ wait
>> Call
>> Event Waits Time (s) (ms)
>> Time
>> ----------------------------------------- ------------ ----------- ------
>> ------
>> log file sync 1,400,773 4,357,902 3111
>> 91.4
>> db file sequential read 457,568 334,834 732
>> 7.0
>> db file parallel write 565,843 27,573 49
>> .6
>> read by other session 16,168 7,395 457
>> .2
>> enq: TX - allocate ITL entry 575 6,854 11919
>> .1
>> -------------------------------------------------------------
>> Host CPU (CPUs: 64 Cores: 8 Sockets: 1)
>> ~~~~~~~~ Load Average
>> Begin End User System Idle WIO
>> WCPU
>> ------- ------- ------- ------- ------- -------
>> --------
>> 3.13 7.04 2.26 3.30 94.44 0.00
>> 7.81
>>
>> Statistic Total per Second per
>> Trans
>> --------------------------------- ------------------ --------------
>> ------------
>> redo synch time 435,852,302 120,969.3
>> 309.7
>> redo synch writes 1,400,807 388.8
>> 1.0
>> redo wastage 5,128,804 1,423.5
>> 3.6
>> redo write time 357,414 99.2
>> 0.3
>> redo writes 9,935 2.8
>> 0.0
>> user commits 1,400,619 388.7
>> 1.0
>>
>>
>> Environment : 11gr2 EE (11.2.0.1), Sol 10 Sparc
>>
>> Thanks,
>> Stalin
>>
>> This email and any attachments are confidential and may also be
>> privileged. If you are not the addressee, do not disclose, copy, circulate
>> or in any other way use or rely on the information contained in this email
>> or any attachments. If received in error, notify the sender immediately and
>> delete this email and any attachments from your system. Emails cannot be
>> guaranteed to be secure or error free as the message and any attachments
>> could be intercepted, corrupted, lost, delayed, incomplete or amended.
>> Standard Chartered PLC and its subsidiaries do not accept liability for
>> damage caused by this email or any attachments and may monitor email
>> traffic.
>>
>> Standard Chartered PLC is incorporated in England with limited liability
>> under company number 966425 and has its registered office at 1 Aldermanbury
>> Square, London, EC2V 7SB.
>>
>> Standard Chartered Bank ("SCB") is incorporated in England with limited
>> liability by Royal Charter 1853, under reference ZC18. The Principal Office
>> of SCB is situated in England at 1 Aldermanbury Square, London EC2V 7SB. In
>> the United Kingdom, SCB is authorised and regulated by the Financial
>> Services Authority under FSA register number 114276.
>>
>> If you are receiving this email from SCB outside the UK, please click
>> http://www.standardchartered.com/global/email_disclaimer.html to refer to
>> the information on other jurisdictions.
>>
>
>
>
> --
> Thanks,
>
> Stalin
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 25 2011 - 14:20:18 CDT

Original text of this message