RE: Locking issue

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 19 May 2021 17:50:39 -0400
Message-ID: <487301d74cf9$02a626d0$07f27470$_at_rsiz.com>



If dynamic stats do anything at all to the dictionary outside the current session, that would be a performance bug of the first order.  

Even stats gathering shouldn’t pile up on the dictionary until and unless it needs to move something from a pending copy to the “live” stats.  

It is entirely possible that someone misunderstood the shell game up to the point of swapping into live permanent, but I would be surprised.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Nenad Noveljic Sent: Wednesday, May 19, 2021 5:16 PM
To: Lok P
Cc: Oracle L
Subject: Re: Locking issue  

That‘s my speculation.  

You can trace the row cache locks for the session to verify.

Best regards,

Nenad

Von meinem iPhone gesendet

Am 19.05.2021 um 23:04 schrieb Lok P <loknath.73_at_gmail.com>:



Thank you.

Do you mean to say even the application query is SELECT query but because it's using dynamic statistics so its updating some internal dictionary cache objects and thus needs a lock. And we know that stats gather does updates data dictionary objects , so eventually they are entering in this conflicting situation.

 And in this case if we would have ensured accurate stats on the table and then it wouldn't have gone for dynamic sampling. Correct me if wrong.  

It's version 19.9.0.0.0 of oracle.  

On Thu, 20 May 2021, 2:23 am Nenad Noveljic, <nenad.noveljic_at_gmail.com> wrote:

There is a buggy (in my opinion) code path in statistics gathering which acquires a row cache lock in the shared mode and then waits on a library cache lock held by another session. The other session holds a library cache lock and wants to acquire the row cache lock being held by the statistics job.  

I reproduced a such case with the debugger: https://nenadnoveljic.com/blog/deadlock-row-cache-lock-library-cache-lock/  

A boundary condition leading to the problem in your case - I’m guessing- might be dynamic sampling, which might require row cache lock to update sampled statistics.  

Best regards,

Nenad

Von meinem iPhone gesendet

Am 19.05.2021 um 22:21 schrieb Lok P <loknath.73_at_gmail.com>:



Also initially these application queries which were experiencing "row cache lock" and were not in parsing stage but later on some other session spawn and were stuck in "cursor: pin S wait on X" and were in parse stage only. So I'm wondering why it's behaving in such a way ?  

Also as the cache id for the blocked session were referring to dc_realtime_tabst and dc_realtime_colst , is it correct to assume that the stats gathering session must be having some kind of read lock on these dictionary cache objects? But ideally it should not block read queries.

Regards

Lok  

On Thu, May 20, 2021 at 1:39 AM Lok P <loknath.73_at_gmail.com> wrote:

We are seeing some strange locking situations because of statistics gathering. While the stats gathering was running on an object and at same time a SELECT on that object getting 'row cache lock', We want to understand why it's happening so? Has it anything to do with the 'dynamic statistics usage' which we are seeing in the note section of the plan?

Below is the occurrence of the issue..

Stats gather started non partition table -TAB1 as part of auto gather job (i.e. gather_database_stats_job_proc) and it started around 10:08PM. And multiple internal sqls were running as part of that stats gather and around 11:06PM we last saw the below statement executed in that session.

select /*+ opt_param('_optimizer_use_auto_indexes' 'on') no_parallel_index(t, "XXX") dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl opt_param('optimizer_inmemory_aware' 'false') no_substrb_pad no_expand index_ffs(t,"XXXX") */ count(*) as nrw,count(distinct sys_op_lbid(228647,'L',t.rowid)) as nlb,count(distinct hextoraw(sys_op_descend("XXX")||sys_op_descend("XXX")||sys_op_descend("XXX"))) as ndk,sys_op_countchg(substrb(t.rowid,1,15),1) as clf from "USER1"."TAB1" sample block ( .0413656293,1) t where "XXX" is not null or "XXX" is not null or "XXXX" is not null

Then at around 11:07:00.687 PM, we have below application query started running from multiple sessions and started experiencing "row cache lock" with P1(cache id=62) as "dc_realtime_colst" , P2(mode) as 3 i.e. 'share mode' and P3(request) as 3 i.e. 'share mode'. Nothing found in the blocking session. Subsequently few other session of same query went into "row cache lock" with P1(cache id=63) as "dc_realtime_tabst" , P2(mode) as 0 i.e. 'null mode - not locked' and P3(request) as 3 i.e. 'share mode'.

 Then we noticed at around 11:07:00.695 PM , the same stats gather session itself started experiencing "row cache lock" with P1(cache id=62) as "dc_realtime_colst" , P2(mode) as 0 i.e. 'null mode - not locked' and P3(request) as 5 i.e. 'exclusive mode'.

SELECT /*+ FULL(P) +*/ * FROM "TAB1" P


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | | | 3777K(100)| | | 1 | TABLE ACCESS STORAGE FULL FIRST ROWS| TAB1 | 733M| 169G| 3777K (1)| 00:02:28 |


Outline Data


 /*+

   BEGIN_OUTLINE_DATA    IGNORE_OPTIM_EMBEDDED_HINTS    OPTIMIZER_FEATURES_ENABLE('19.1.0')    DB_VERSION('19.1.0')    OPT_PARAM('_fix_control' '27268249:0')

   FIRST_ROWS    OUTLINE_LEAF(_at_"SEL$1")

   FULL(_at_"SEL$1" "P"_at_"SEL$1")

   END_OUTLINE_DATA  */  

Note


  • dynamic statistics used: statistics for conventional DML
--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 19 2021 - 23:50:39 CEST

Original text of this message