Re: Session With oper EXCL is also waiting - where to now? (systemstate dump)

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Thu, 20 Sep 2012 00:02:16 +0300
Message-ID: <CAMHX9JJ0Vf9WNRPZ2swkPDVkf7ouveaLS6YhNyDbNw8o8WCQpA_at_mail.gmail.com>



This process was holding 3 different latches at the time of dumping (shared pool simulator, shared pool, row cache objects):

      holding (efd=11) a5b747220 Child *shared pool simulator* level=8 child#=15 <---****

        Location from where latch is held: kglsim_scan_lru: scan:****

        Context saved from call: 0****

        state=free, wlstate=free****

      holding (efd=11) 600fab30 Child *shared pool* level=7 child#=1****

        Location from where latch is held: *kghfrunp: alloc: wait:*****

        Context saved from call: 0****

        state=busy, wlstate=free****

      holding (efd=11) *a5b6bd2a0* Child *row cache objects* level=4 child#=16****

        Location from where latch is held: *kghfrunp: clatch: wait:*****

And the reason why this process was holding the latch is "kghfrunp" function - Kernel Generic Heap-management FRee UNPinned chunks. It shows that the process was flushing out stuff from shared pool. This can happen for a few reasons - the normal reason is when just allocating space in shared pool and the suitable freelists are empty. But in this case it may be just MMAN (still) shrinking shared pool. Check the session state object just under that process (search for "(session)") and see which session/user it is.

Note that when searching for "kghfrunp row cache objects" in MOS, this came up:

*LCK temporarily stuck waiting for latch 'Child row cache objects' [ID 843638.1]*

This may be somewhat related too, worth checking out:

*Bug 6368621 I/O slave creation time-outs, under extreme memory pressure*

Tanel.

On Wed, Sep 19, 2012 at 11:34 PM, <Christopher.Taylor2_at_parallon.net> wrote:

> Ah hah (see what it's waiting on...also any additional guidance from here?)
> ****
>
> ** **
>
> Here's the HOLDER****
>
> ** **
>
> PROCESS 152:****
>
> ----------------------------------------****
>
> SO: 0xa7aba8578, type: 2, owner: (nil), flag: INIT/-/-/0x00****
>
> (process) Oracle pid=152, calls cur/top: 0x6f9156810/0x6f8710648, flag:
> (0) -****
>
> int error: 0, call error: 0, sess error: 0, txn error 0****
>
> (post info) last post received: 0 0 163****
>
> last post received-location: kqrbtm****
>
> last process to post me: a7ab873c0 1 6****
>
> last post sent: 0 0 24****
>
> last post sent-location: ksasnd****
>
> last process posted by me: a7ab873c0 1 6****
>
> (latch info) wait_event=0 bits=90****
>
> --> holding (efd=11) a5b747220 Child shared pool simulator level=8
> child#=15 <---****
>
> Location from where latch is held: kglsim_scan_lru: scan: ****
>
> Context saved from call: 0****
>
> state=free, wlstate=free****
>
> holding (efd=11) 600fab30 Child shared pool level=7 child#=1 ****
>
> Location from where latch is held: kghfrunp: alloc: wait: ****
>
> Context saved from call: 0****
>
> state=busy, wlstate=free****
>
> holding (efd=11) a5b6bd2a0 Child row cache objects level=4
> child#=16 ****
>
> Location from where latch is held: kghfrunp: clatch: wait: ****
>
> ** **
>
> ** **
>
> Here's the WAITS for the HOLDER****
>
> Dumping Session Wait History****
>
> for 'latch: shared pool' count=1 wait_time=57****
>
> address=600fab30, number=d6, tries=0****
>
> for 'latch: shared pool' count=1 wait_time=26****
>
> address=600fab30, number=d6, tries=0****
>
> for 'latch: row cache objects' count=1 wait_time=197****
>
> address=a5b6bd2a0, number=c8, tries=1****
>
> for 'latch: row cache objects' count=1 wait_time=595844****
>
> address=a5b6bd2a0, number=c8, tries=0****
>
> for 'latch: shared pool' count=1 wait_time=86****
>
> address=600fab30, number=d6, tries=0****
>
> for 'latch: shared pool' count=1 wait_time=65****
>
> address=600fab30, number=d6, tries=0****
>
> for 'latch: row cache objects' count=1 wait_time=316****
>
> address=a5b6bd2a0, number=c8, tries=1****
>
> for 'latch: row cache objects' count=1 wait_time=457111****
>
> address=a5b6bd2a0, number=c8, tries=0****
>
> for 'latch: row cache objects' count=1 wait_time=71****
>
> address=a5b6bd2a0, number=c8, tries=1****
>
> for 'latch: row cache objects' count=1 wait_time=527243****
>
> address=a5b6bd2a0, number=c8, tries=0****
>
> temporary object counter: 0****
>
> ** **
>
> *From:* tanel_at_poderc.com [mailto:tanel_at_poderc.com] *On Behalf Of *Tanel
> Poder
> *Sent:* Wednesday, September 19, 2012 3:26 PM
> *To:* Taylor Christopher - Nashville
> *Cc:* oracle-l_at_freelists.org
> *Subject:* Re: Session With oper EXCL is also waiting - where to now?
> (systemstate dump)****
>
> ** **
>
> Search for the address a5b6bd2a0 (case insensitive) in the systemstate
> dump file. If some process still happened to be holding that latch while
> the dump read its state object, it should dump out which one it was (and
> which session state object was under it).
>
>
> But allowing shared pool to automatically shrink is a bad idea in
> super-critical latency-sensitive (OLTP) systems ... better "waste" some
> extra memory for shared pool instead of having to live with CPU usage and
> share pool latch/library cache mutex contention spikes... ****
>
> ** **
>
> Of course in your case the symptom was even more extreme (a hang), which
> may be due to a bug ... ****
>
> -- ****
>
> Tanel Poder****
>
> Blog - http://blog.tanelpoder.com****
>
> App - http://voic.ee****
>
> ** **
>
> ** **
>
> On Wed, Sep 19, 2012 at 11:01 PM, <Christopher.Taylor2_at_parallon.net>
> wrote:****
>
> (apologies for bad formatting - I'm trying to keep it simple tho)
> (pastebin is blocked, but used gist https://gist.github.com/3751878)
> 3-node RAC 10.2.0.4
>
> We had a "hang" last night from 2:50 to ~3:48 and Oracle was kind enough
> to provide me with an automated system state dump.
> (the bottom of this email is where I'm confused - where do I go from here?)
>
> Looking at the ASH and AWR reports for the 2 hour window:
>
> I see:
> 1.) An SGA Resize occurred:
> Buffer Cache 30,608M increases to 30,656M
> Shared Pool Size 5,136M *decreases* to 5,088M
>
> ****
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Sep 19 2012 - 16:02:16 CDT

Original text of this message