Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster

From: natalka roshak <natalka.roshak_at_oracle.com>
Date: Mon, 03 Nov 2014 16:30:24 -0500
Message-ID: <5457F3F0.1090804_at_oracle.com>

Hi Nico, did you take a hanganalyze dump? "oradebug hanganalyze 3"

The hanganalyze dump shows wait chains from the hang analysis cache, also accessible via v$wait_chain. I haven't used snapper, but I googled it & it doesn't seem to query v$wait_chains.

If you didn't happen to take a hanganalyze dump: I haven't tried this AT ALL, but I just ran across another script by snapper's author Tanel Poder, which seems to pull wait chains from the active session history: http://blog.tanelpoder.com/files/scripts/ash/ash_wait_chains.sql or http://blog.tanelpoder.com/files/scripts/ash/dash_wait_chains.sql

-Natalka

--
Natalka Roshak | RAC/Storage BDE
Oracle <http://www.oracle.com/index.html>Global Product Support
45 O'Connor St, Suite 400 | Ottawa, Canada
On 03/11/2014 2:24 PM, nico torres wrote:


> John:

>

>  The machine has 16 cores, 2 sockets, it's a virtual machine on a 

> server cluster, what should I look into?

>

> Thanks!

>

> Nico

>

> 2014-11-03 15:47 GMT-03:00 John Clarke <jcclarke00_at_gmail.com 

> <mailto:jcclarke00_at_gmail.com>>:

>

>     Nico,

>

>     Sounds like it could be CPU oversubscription - while there are

>     bugs related to cursor%mutex% waits, I would this first.

>

>     How many CPU cores are there on your machine?

>

>     On Mon, Nov 3, 2014 at 9:58 AM, nico torres <frasebook4_at_gmail.com

>     <mailto:frasebook4_at_gmail.com>> wrote:

>

>         Hi, list.

>          I've had this situation last saturday. I got a call, telling

>         me that no application could get into the database; checked

>         alert log, ORA-00018: maximum number of sessions exceeded

>         everywhere.

>         Db host felt kind of clunky and slow, so I checked htop, 100%

>         utilization on all CPU's, around *462 load*, wow.

>           Tried to log / as sysdba, naturally couldn't, tried

>         with -prelim flag, neither.

>          I finally could get into the database, and immediately run

>         snapper, and what I saw was all sessions waiting for "*cursor:

>         mutex S*", with no running queries, no sql_id available. So I

>         came into the conclussion that all sesions hung with the same

>         wait, and in the same time, applications logged in more and

>         more sessions, so it reached max sessions and nobody else

>         could login.

>          All of those sessions had different blockers, blocking each

>         other, and different object#, so I coulnd't find a main

>         culprit either.

>          Tried killing those sessions, couldn't; Tried closing the

>         instance with a shutdown immediate and it got hung, so I ended

>         up issuing a shutdown abort. Instance closed, and started up

>         pretty fast and things got back to normal.

>          Looking at AWR snaps from that timestamp, I got:

>

>

>         Top 5 Timed Foreground Events

>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>                                              Avg

>                                              wait   % DB

>         Event      Waits      Time(s)  (ms)      time

>         ------------------------------ ------------ ----------- ------

>         ------

>         cursor: mutex S  20,166,853   1,057,135     52   86.5

>         library cache lock  90,243       91,516   1014    7.5

>         DB CPU   43,678                               3.6

>         library cache: mutex X 74,002       10,816    146      .9

>         db file sequential read  276,847         1,417      5        .1

>

>         As expected, lots of "cursor: mutex S", but also Library cache

>         lock and library cache:mutex X.

>

>         Load Profile              Per Second  Per Transaction

>         ~~~~~~~~~~~~         ---------------  ---------------

>               DB Time(s):              249.6              1.4

>                DB CPU(s):                8.9              0.1

>                Redo size:           77,451.6            438.3

>            Logical reads:           29,037.2            164.3

>            Block changes:              320.9              1.8

>           Physical reads:               70.3              0.4

>          Physical writes:               23.5              0.1

>               User calls:              755.3              4.3

>                   Parses:              197.7              1.1

>              Hard parses:                0.2              0.0

>         W/A MB processed:            0.1          0.0

>                   Logons:                1.9              0.0

>                 Executes:              203.8              1.2

>                Rollbacks:              173.7              1.0

>             Transactions:              176.7

>

>         Hard parses not a concern here.

>

>         Statistic Name                            Time (s)       % of

>         DB Time

>         ------------------------------------------ ------------------

>         ------------

>         parse time elapsed                          1,217,693.8   99.6

>         connection management call elapsed time              

>         868,991.0         71.1

>         DB CPU                                  43,677.7          3.6

>

>         All sessions clearly hung at parse time.

>

>         Foreground Wait Events

>         cursor: mutex S              20,166,853   100  1,057,135    

>          52     23.3   86.5

>         library cache lock  90,243      0     91,516    1014      0.1

>            7.5

>         library cache: mutex X           74,002      0     10,816    

>         146      0.1     .9

>         db file sequential read  276,847      0      1,417       5

>          0.3     .1

>         cursor: pin S wait on X 94     0        164    1749      0.0 .0

>

>         I searched around MOS and some bugs appeared, but couldn't

>         find one that matches entirely.

>

>         Some information about the database:

>         -------------------------

>

>         Oracle *11.2.0.1*/, /it needs some patching urgently and a bug

>         is my main suspect.

>

>         It uses ASSM

>                                     begin    end

>          SGA use (MB):      6,656.0  6,656.0

>          PGA use (MB):        561.1  1,213.3

>

>         Cache sizes between the snaps:

>             Buffer Cache:     3,840M

>         Shared Pool Size:    2,496M

>

>

>         Parameters:

>

>         memory_target: 13421772800

>         open_cursors                  300

>         processes                     600

>         sessions                      928

>         session_cached_cursors 50

>

>         Could anyone  please give me some advice on how to further

>         investigate this issue? Any other information needed?

>

>         Thanks

>

>

>

>

>

>

>



--
http://www.freelists.org/webpage/oracle-l

Received on Mon Nov 03 2014 - 22:30:24 CET

This message: [ Message body ]
Next message: Tanel Poder: "Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster"
Previous message: nico torres: "Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster"
In reply to: nico torres: "Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster"
Next in thread: Tanel Poder: "Re: cursor: mutex S + library cache lock + library cache: mutex X = disaster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message