Re: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

From: Stefan Knecht <knecht.stefan_at_gmail.com>
Date: Tue, 15 Sep 2015 16:37:05 +0700
Message-ID: <CAP50yQ-+MtNiR3f=1EuvCF19EDOjXLe4WS02kYqau0LbFEhEnQ_at_mail.gmail.com>



Hey Jonathan

On Tue, Sep 15, 2015 at 3:37 PM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk
> wrote:

>
>
> Are the 4 out of 8 isolated to one of the two slave sets, or are they
> scattered randomly across the two slave sets. (I would suspect the former).
>
>

It varies, a lot. Sometimes it's all of the slaves as well. Like right now I couldn't reproduce it and see only a subset of the slaves in the waits (based on 10046).

> It's probably expected behaviour (at least in some circumstances) but an
> extreme example of slow parse times. I have seen a couple of cases in 11g
> where PX slaves actually managed to get different execution plans for the
> same query at the same time because the slaves re-optimized the query - if
> this is possible, then the slaves could be in a position where they are
> attempting to re-optimize the query, but are still waiting for the QC to
> finish its parse call.
>
> Gathering related information:
>
> From v$sql - with one call to execute a new version of the statement, how
> many parse_calls does the parallel statement report.
>

SQL> select sql_id, child_number, executions, invalidations, parse_calls from v$sql where sql_id in ('71kptzs95ynkz','8qg4p92muqjq2');

SQL_ID CHILD_NUMBER EXECUTIONS INVALIDATIONS PARSE_CALLS ------------- ------------ ---------- ------------- -----------

8qg4p92muqjq2 0    1  0      9
8qg4p92muqjq2 1    0  0      0
71kptzs95ynkz 0    1  0      5
71kptzs95ynkz 1    0  0      0

That's from the last two executions I just did.

> If you enable 10053 for a serial and a parallel version, does the parallel
> trace show much more work than the serial ? Does the 12c serial show much
> more work than the 11g serial?
>

I've looked at many 10053 traces and I didn't spot anything vastly different. Also, plans are identical for both the execution with binds and without binds. I've not tested much without parallelism because it's not really an option in this environment not to use it. I can run some tests, though.

> If you enable 10046 trace can you see the 4 PX slaves doing work that
> looks like re-optimization after they get the child mutex
>
>

This is a bit tricky to answer. We sometimes see adaptive statistics kick in. But I've not seen a full re-optimization. We see the wait just immediately after the parse. It's not attributed to the parse call, like this:

PARSE
#47025966097608:c=1000,e=121,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3532680364,tim=1441216853534130 WAIT #47025966097608: nam='cursor: pin S wait on X' ela= 202239 idn=3122084395 value=3697966841856 where=21474836480 obj#=-1 tim=1441216853637090

But anything related to the adaptive optimizer in 12.1 can be disabled, and we're still seeing the waits.

> Speculative question
> Do the queries involve partitioned tables ?
>
>

Yes.

>
>
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> _at_jloracle
> ------------------------------
> *From:* oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on
> behalf of Stefan Knecht [knecht.stefan_at_gmail.com]
> *Sent:* 15 September 2015 09:12
> *To:* oracle-l-freelists
> *Subject:* kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with
> binds
>
> Hi all
>
> Fairly odd event. After an upgrade to 12.1.0.2 we've been seeing very high
> "cursor: pin S wait on X" waits. But here's the catch - they only happen
> under very special (and seemingly odd) circumstances:
>
> - Query runs in parallel
> - Query uses bind variables
> - Query is hard parsed
>
> If we run the same query in serial, or with literal values, or we re-use a
> cursor, no waits are seen.
>
> The waits are huge (up to 0.3 seconds) and are seen on the PX slaves only.
> Say we're using 8 slaves, 4 of them will show these waits. 4 won't. And
> only if we're using binds. No binds, no waits.
>
> Does anyone have an explanation for what could be happening?
>
> I've already tried disabling some of the new fancy adaptive features, and
> most bind related features (peeking, px bind peeking, bind value captures,
> etc). Heck, it even happens with opt features set to 11.2.
>
> On a very similar database, that's still on 11.2.0.3, we're also seeing
> the same behavior. The key difference being that the waits are very short -
> less than 0.01 seconds. But they're still there, and I don't see why they
> would be in the first place. What causes Oracle to pin that mutex, if a
> bind variable is used in the query?
>
> A bug has been filed. So we'll see what comes out of that. It would seem
> odd that we'd be the first ones to hit this, it's not exactly exotic to run
> a parallel query with binds.
>
> Stefan
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 15 2015 - 11:37:05 CEST

Original text of this message