RE: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 15 Sep 2015 10:25:28 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D9282A3EA0_at_EXMBX01.thus.corp>


What's the environment from which you are calling the SQL ? Is it a basic SQL*Plus session, or some type of front-end that might be causing the double parse with binds anyway ? I forget which mid-tier, or driver it was but there used to be common questions about seeing statement with bind variables being parsed twice, once with no values for the bind variable, then a second time because the values were only sent on a second call.

Can you find any consistency in the queries which show this behaviour (apart from the partitioning) ? If you check the v$pq_tqstats view after running the queries do you see a "ranger" operation as the first operation ? (This might be visible as a S->P distribution into TQ10000 if all you can get at is execution plans).

Have you tried adding hints to any of the queries that persistently show this problem simply to check if the restriction on the optimization phase bypasses the problem ?

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: Stefan Knecht [knecht.stefan_at_gmail.com] Sent: 15 September 2015 10:50
To: mohamed.houri_at_gmail.com
Cc: Jonathan Lewis; oracle-l-freelists
Subject: Re: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

So the key difference is that it's creating one more child, if we're using bind variables. Whereas the query with literals only has child 0.

And that induces an overhead of anywhere between 1 and 3 seconds. Wow.

On Tue, Sep 15, 2015 at 4:48 PM, Stefan Knecht <knecht.stefan_at_gmail.com<mailto:knecht.stefan_at_gmail.com>> wrote: Here's another test, without binds:

SQL> select sql_id, child_number, executions, invalidations, parse_calls from v$sql where sql_id='avgm0shx7hnmv';

SQL_ID CHILD_NUMBER EXECUTIONS INVALIDATIONS PARSE_CALLS ------------- ------------ ---------- ------------- ----------- avgm0shx7hnmv 0 1 0 9

We still see several parse calls. But zero "cursor: pin S wait on X" waits in any of the tracefiles.

On Tue, Sep 15, 2015 at 4:38 PM, Stefan Knecht <knecht.stefan_at_gmail.com<mailto:knecht.stefan_at_gmail.com>> wrote: Mohamed

This is precisely happening during a hard parse only. There's nothing to share - except between the PX slaves. The query is deliberately being hard parsed, and a new cursor is created hence v$sql_shared_cursor, etc won't be helpful in this case.

Stefan

On Tue, Sep 15, 2015 at 4:33 PM, Mohamed Houri <mohamed.houri_at_gmail.com<mailto:mohamed.houri_at_gmail.com>> wrote: "cursor: pin S wait on X" is a symptom of a session wanting a cursor on Shared mode but another session is holding the same cursor in eXlusive mode.

Typically the cursor holding the eXclusive lock is hard parsing. It might also be due to Extended Cursor Sharing kicking in and optimizing an execution plan for each execution.This is why, as far as you are using bind variables, I would have started by checking the reason for which your cursor is not shared (using nonshared.sql script of Tanel Poder for example)

Best regards
Mohamed

2015-09-15 10:37 GMT+02:00 Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>>:

Are the 4 out of 8 isolated to one of the two slave sets, or are they scattered randomly across the two slave sets. (I would suspect the former).

It's probably expected behaviour (at least in some circumstances) but an extreme example of slow parse times. I have seen a couple of cases in 11g where PX slaves actually managed to get different execution plans for the same query at the same time because the slaves re-optimized the query - if this is possible, then the slaves could be in a position where they are attempting to re-optimize the query, but are still waiting for the QC to finish its parse call.

Gathering related information:

From v$sql - with one call to execute a new version of the statement, how many parse_calls does the parallel statement report. If you enable 10053 for a serial and a parallel version, does the parallel trace show much more work than the serial ? Does the 12c serial show much more work than the 11g serial? If you enable 10046 trace can you see the 4 PX slaves doing work that looks like re-optimization after they get the child mutex

Speculative question
Do the queries involve partitioned tables ?

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] on behalf of Stefan Knecht [knecht.stefan_at_gmail.com<mailto:knecht.stefan_at_gmail.com>] Sent: 15 September 2015 09:12
To: oracle-l-freelists
Subject: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

Hi all

Fairly odd event. After an upgrade to 12.1.0.2 we've been seeing very high "cursor: pin S wait on X" waits. But here's the catch - they only happen under very special (and seemingly odd) circumstances:

  • Query runs in parallel
  • Query uses bind variables
  • Query is hard parsed

If we run the same query in serial, or with literal values, or we re-use a cursor, no waits are seen.

The waits are huge (up to 0.3 seconds) and are seen on the PX slaves only. Say we're using 8 slaves, 4 of them will show these waits. 4 won't. And only if we're using binds. No binds, no waits.

Does anyone have an explanation for what could be happening?

I've already tried disabling some of the new fancy adaptive features, and most bind related features (peeking, px bind peeking, bind value captures, etc). Heck, it even happens with opt features set to 11.2.

On a very similar database, that's still on 11.2.0.3, we're also seeing the same behavior. The key difference being that the waits are very short - less than 0.01 seconds. But they're still there, and I don't see why they would be in the first place. What causes Oracle to pin that mutex, if a bind variable is used in the query?

A bug has been filed. So we'll see what comes out of that. It would seem odd that we'd be the first ones to hit this, it's not exactly exotic to run a parallel query with binds.

Stefan

--

Houri Mohamed

Oracle DBA-Developer-Performance & Tuning

Member of Oraworld-team<http://www.oraworld-team.com/>

Visit My         - Blog<http://www.hourim.wordpress.com/>

Let's Connect - <http://fr.linkedin.com/pub/mohamed-houri/11/329/857/> Linkedin Profile<http://fr.linkedin.com/pub/mohamed-houri/11/329/857/>

My Twitter<https://twitter.com/MohamedHouri>      - MohamedHouri<https://twitter.com/MohamedHouri>




--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 15 2015 - 12:25:28 CEST

Original text of this message