Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster

From: nico torres <frasebook4_at_gmail.com>
Date: Mon, 3 Nov 2014 16:24:31 -0300
Message-ID: <CAJWEwFz6tK_BaJCXYpdvQDG37tLAJ7Rcb+mL7NKTayLYXB+LPw_at_mail.gmail.com>



John:

 The machine has 16 cores, 2 sockets, it's a virtual machine on a server cluster, what should I look into?

Thanks!

Nico

2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00_at_gmail.com>:

> Nico,
>
> Sounds like it could be CPU oversubscription - while there are bugs
> related to cursor%mutex% waits, I would this first.
>
> How many CPU cores are there on your machine?
>
> On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4_at_gmail.com> wrote:
>
>> Hi, list.
>> I've had this situation last saturday. I got a call, telling me that no
>> application could get into the database; checked alert log, ORA-00018:
>> maximum number of sessions exceeded everywhere.
>> Db host felt kind of clunky and slow, so I checked htop, 100% utilization
>> on all CPU's, around *462 load*, wow.
>> Tried to log / as sysdba, naturally couldn't, tried with -prelim flag,
>> neither.
>> I finally could get into the database, and immediately run snapper, and
>> what I saw was all sessions waiting for "*cursor: mutex S*", with no
>> running queries, no sql_id available. So I came into the conclussion that
>> all sesions hung with the same wait, and in the same time, applications
>> logged in more and more sessions, so it reached max sessions and nobody
>> else could login.
>> All of those sessions had different blockers, blocking each other, and
>> different object#, so I coulnd't find a main culprit either.
>> Tried killing those sessions, couldn't; Tried closing the instance with
>> a shutdown immediate and it got hung, so I ended up issuing a shutdown
>> abort. Instance closed, and started up pretty fast and things got back to
>> normal.
>> Looking at AWR snaps from that timestamp, I got:
>>
>>
>> Top 5 Timed Foreground Events
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Avg
>>
>> wait % DB
>> Event Waits Time(s) (ms)
>> time
>> ------------------------------ ------------ ----------- ------ ------
>> cursor: mutex S 20,166,853 1,057,135 52 86.5
>> library cache lock 90,243 91,516 1014 7.5
>> DB CPU 43,678
>> 3.6
>> library cache: mutex X 74,002 10,816 146 .9
>> db file sequential read 276,847 1,417 5
>> .1
>>
>> As expected, lots of "cursor: mutex S", but also Library cache lock and
>> library cache:mutex X.
>>
>> Load Profile Per Second Per Transaction
>> ~~~~~~~~~~~~ --------------- ---------------
>> DB Time(s): 249.6 1.4
>> DB CPU(s): 8.9 0.1
>> Redo size: 77,451.6 438.3
>> Logical reads: 29,037.2 164.3
>> Block changes: 320.9 1.8
>> Physical reads: 70.3 0.4
>> Physical writes: 23.5 0.1
>> User calls: 755.3 4.3
>> Parses: 197.7 1.1
>> Hard parses: 0.2 0.0
>> W/A MB processed: 0.1 0.0
>> Logons: 1.9 0.0
>> Executes: 203.8 1.2
>> Rollbacks: 173.7 1.0
>> Transactions: 176.7
>>
>> Hard parses not a concern here.
>>
>> Statistic Name Time
>> (s) % of DB Time
>> ------------------------------------------ ------------------ ------------
>> parse time elapsed
>> 1,217,693.8 99.6
>> connection management call elapsed time 868,991.0
>> 71.1
>> DB CPU
>> 43,677.7 3.6
>>
>> All sessions clearly hung at parse time.
>>
>> Foreground Wait Events
>> cursor: mutex S 20,166,853 100 1,057,135 52 23.3
>> 86.5
>> library cache lock 90,243 0 91,516 1014
>> 0.1 7.5
>> library cache: mutex X 74,002 0 10,816 146
>> 0.1 .9
>> db file sequential read 276,847 0 1,417 5
>> 0.3 .1
>> cursor: pin S wait on X 94 0 164 1749
>> 0.0 .0
>>
>> I searched around MOS and some bugs appeared, but couldn't find one that
>> matches entirely.
>>
>> Some information about the database:
>> -------------------------
>>
>> Oracle *11.2.0.1**, *it needs some patching urgently and a bug is my
>> main suspect.
>>
>> It uses ASSM
>> begin end
>> SGA use (MB): 6,656.0 6,656.0
>> PGA use (MB): 561.1 1,213.3
>>
>> Cache sizes between the snaps:
>> Buffer Cache: 3,840M
>> Shared Pool Size: 2,496M
>>
>>
>> Parameters:
>>
>> memory_target: 13421772800
>> open_cursors 300
>> processes 600
>> sessions 928
>> session_cached_cursors 50
>>
>> Could anyone please give me some advice on how to further investigate
>> this issue? Any other information needed?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 03 2014 - 20:24:31 CET

Original text of this message