Re: cache buffer chains/where in code

From: Tanel Poder <tanel_at_poderc.com>
Date: Fri, 20 Nov 2009 12:52:41 +0800
Message-ID: <4602f23c0911192052m407bee15ybb85b45101c05c82_at_mail.gmail.com>



Fast path gets can happen for any buffer gets, its just a new, simpler way of pinning buffers using compare and swap in 11g (this is controlled by _fastpin_enabled parameter). Cache buffers chains latches are still used though (maybe in 12g they replace everything with KGX mutexes just like they did in library cache in 11gR1).

kcbgtcr is the "consistent read" function kcbgcur is the "current mode get/db block get" kcbrls is the "release/unpin" function

All the above are normal, regular logical IO functions for data access.

So, basically your sessions are doing too many logical IOs (against buffers protected by same latches) for your current hardware capabilities. If this is a new Sun T5xxx server then its also a NUMA box which can further exaggerate latch contention issues especially when the Solaris/Oracle CPU preemption control gets turned off (threads which happen to run on the same NUMA node where the latch memory line physically lives will get the latch more likely and starve others).

So you need to see how to do less logical IOs, especially against the hot blocks. Think hash joins instead of nested loops when joining large datasets etc. Of course if you can avoid running the queries, don't run 'em at all :)

You can use my latchprofX sql to identify exact SQL statements causing the most of contention and also exact data block address, the Statspack/AWR/touch count based approaches are prone to error.

Here comes some really wide output so you may want to paste it into some editor if it wraps on your screen:

SQL> _at_latchprofx sid,name,hmode,func,*sqlid,sqlchild,object* 112

       SID NAME                                HMODE
FUNC                                     SQLID           SQLCHILD
OBJECT       Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ------------
---------------------------------------- ------------- ----------
----------------- ---------- ---------- ------- ----------- -----------
       112 cache buffers chains                maybe-shared kcbgtcr: fast
path (cr pin)              *f9k1tuk55yr6r*          0            *41A5B1*
2507       2494    2.51      17.800        .007
       112 cache buffers chains                maybe-shared kcbrls: fast
release                     *f9k1tuk55yr6r*          0
*41A5B1*
1031        980    1.03       7.320        .007

SQL>

SQL>
SQL> _at_dba *41A5B1*

    RFILE# BLOCK#
---------- ----------

         *1 107953*

Press enter to find the segment using V$BH (this may take CPU time), CTRL+C to cancel:

STATE      BLOCK_CLASS         DATA_OBJ# OBJECT_TYPE
object                                          TCH  MODE_HELD D T P S D
FLG_LRUFLG                  DQ
---------- ------------------ ---------- -------------------
---------------------------------------- ---------- ---------- - - - - -
------------------- ----------
xcur       data block              72032 INDEX
SYS.SYS_IOT_TOP_72031                            24          0 N N N N N
0:0                          0


In above case theres only one "hot" block, which belongs to an index. DBA_SEGMENTS shows that this block# is right after the index segment header block, thus its an index root block.

Of course, your latch contention may not be due a single hot block/child latch, but you shouldn't find out from ASH/v$session_wait or sql_trace that which CBC latches are having the most contention.

Note that latch contention can also just be a symptom of CPU starvation, so if you're running at max physical CPU capacity you should fix that first.

Tanel Poder
http://blog.tanelpoder.com

On Fri, Nov 20, 2009 at 6:56 AM, Christo Kutrovsky < kutrovsky.oracle_at_gmail.com> wrote:

> Hello All,
>
> I am troubleshooting cache buffer chain latch issues in 11.1.0.7
> running on SPARC CMT cpu. I have AWR report, and I've determined which
> query is responsible for the bulk of the waits.
>
> For that 10 min AWR period, the bulk of the waits are happening in the
> following cache buffer chain code path:
>
> "kcbgtcr: fast path"
>
> As far as I remember, this code path only happens for indexes? If this
> is correct, then I know what index is causing this in my query. If
> not, I have some latch profiling to do.
>
> If anyone hash a link or a reference for the other code paths for
> cache buffer chain latch that would be some invaluable information to
> have. For example for these:
>
> kcbrls_2
> kcbgtcr_9
> kcbgcur: fast path (shr)
> kcbrls: fast release
> kcbgtcr: fast path (cr pin)
> kcbgcur_2
>
> Thanks in advance.
> --
> Christo Kutrovsky
> Senior Consultant
> Pythian.com
> I blog at http://www.pythian.com/blogs/
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>


--

Tanel Poder
http://blog.tanelpoder.com

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 19 2009 - 22:52:41 CST

Original text of this message