Re: 12c Parallelism in windows potentially behind library cache lock/ PX Deq: Parse Reply/ PX Deq: Execution Msg

From: k.hadd <kouss.hd_at_gmail.com>
Date: Tue, 11 Jun 2019 15:04:26 -0400
Message-ID: <CAF+xsmR2VngqUNJc3g0oUAZY8fX3+5FTFdEb13Q47L2=M9CKVg_at_mail.gmail.com>



 Thanks Mark for the feedback ,
I wasn't lucky finding the Doc you were talking about and the upgrade to 12.2 is a long stretch considering the application is barely certified for 12.1.0.1 .

I did read few posts here in Oracle-L (Jonathan Lewis comments) about PX slaves actually managed to get different execution plans for the same query at the same time because the slaves re-optimized the query but nothing that speaks really to my current situation .

Here are some actions that have been taken in the mean time .

  1. Deactivated the parallelism : alter system set parallel_max_servers=1 ---- new execution plans tend to be serialized
  2. // Optmizer adaptive features : Alter system set optimizer_adaptive_features=false; --- no effect
  3. Downgraded the Optmizer features version to *11.2.0.1 : * Alter system set optimizer_features_enable='11.2.0.1'; ---might go lower
  4. The client DB has been moved to a new VM (part of planned maintenance : the existing OS had few minor issues )
    • And as if it wasn't enough : a BI report is now hanging for 45 minutes before crashing (although few occurrences are still making it in 13 seconds to 9 min . almost similar than the performance before the parallelism deactivation)

         I first thought the degradation was due of the fact that every Row Source Operation (index /full scan ,hash joins ,etc..) was now executed in serial for all queries in the database.

      But to my big surprise , I discovered that this very report was running half of its operations in parallel during a live monitoring of its session before crashing.

      I mean what the heck !!!!!!!!!!!!!!! max_parallel_servers was explicitly set to 1. Is this Optmizer such a nasty

       Anybody have seen such phenomenon on 12.1 ?

Thanks for the help.

On Fri, Jun 7, 2019 at 9:41 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> 12.1 there is (was) a bug with a patch available where child parses can
> get a different plan (multiple times), it doesn’t match, repeat, and
> hilarity ensues.
>
> Sorry I don’t have any physical notes on that due to site security AND I
> just don’t remember long strings like bug numbers and MOS notes any more.
>
>
>
> I think that matches your symptoms, so worth a look. If memory serves the
> bug look-up has Concurrency cursor: pin S wait on X in it.
>
>
>
> Good luck!
>
>
>
> mwf
>
>
>
> PS: 12.2 seems a lot more reliable if you can get there (at least from my
> experience, but I don’t have a large sample size of different sites.)
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *k.hadd
> *Sent:* Friday, June 07, 2019 7:39 PM
> *To:* Oracle-L_at_freelists.org
> *Subject:* Fwd: 12c Parallelism in windows potentially behind library
> cache lock/ PX Deq: Parse Reply/ PX Deq: Execution Msg
>
>
>
>
>
> Hi Oracle mates,
>
>
>
> I have been tracing sessions this week where a *query with multiple
> unions caused user sessions to freeze subsequently* .There seemed to be
> a contention in the library cache (locks) but -Oddly- when I checked the
> trace file report using sqlt a huge time For these queries was unaccounted
> for or Idle Time (*PX Deq: Parse Reply* )
>
>
>
> Environment : *OS :* (VM) Windows server 2012 16 VCPU [ CPU and RAM
> aren't reserved ]
>
> *DB* : 12c Enterprise Edition Release 12.1.0.1.0 -
> 64bit
>
>
>
> Also worth mentioning is that sometimes the optimizer has a knack for
> wrongly choosing parallel over serial execution when the later is 50%
> faster.
>
>
>
> Any clue or insight on best practices to tackle parallelism issues on VM
> server (other than completely disabling parallelism )?
>
>
>
> Note: OPTIMIZER_ADAPTIVE_FEATURES to FALSE didn't help But disabling
> parallelism kept the execution in sequence . I don' know why the Optimizer
> keeps choosing a PX plan .
>
> Back when the application ran on 10g we used to have Px deq : credit wait
> events a lot but it's other parallel events since the upgrade to 12c.
>
>
>
>
>
> Really appreciate any input .
>
> Thank you
>
>
>
>
>
> *• Context and findings :*
>
> Below are the informations collected so far between ORACLE-S-ASH views
> and the sqlt (*TRCANLZR*) report on the trace File ( 30 minutes).
>
>
>
> *1- S-ASH Top events report for the related period (times are in Secs)*
>
>
>
> BEGIN_TIME END_TIME WAIT_CLASS EVENT
> SQL_ID COUNT TIME_WAITED AVG_TIME_WAITED
> -------------------- -------------------- ------------
> ---------------------------------------- ------------- ----------
> ----------- ---------------
> 31-may-2019 10:00:00 31-may-2019 10:14:59 Concurrency library cache lock
> *11wythn1bf8c5* 190968 556700
> 2,91
> 31-may-2019 10:00:03 31-may-2019 10:14:29 Concurrency library cache:
> mutex X *11wythn1bf8c5* 1690 15
> 0
> 31-may-2019 10:01:04 31-may-2019 10:01:07 Concurrency cursor: pin S wait
> on X *11wythn1bf8c5* 4 10
> 2,5
>
>
>
> *2- The TraceFile for one of the users *
> Relevant Executions
>
> * Performance stats for * for one of the executions . Both Idle *(PX
> deq : Parse Reply *which isn't really idle time) and *non accounted for
> times* make for almost 100% of the response time.
>
> I suspect the freeze comes from a parallel CPU en-queue during library
> cache access but don't know how I could avoid this .
>
> *Rank*
>
>
>
> *TraceETPct1*
>
>
>
> *SelfElapsedTime2*
>
> *CPU Time*
>
>
> *Non-IdleWait Time*
>
>
>
> *RecursiveElapsedTime3*
>
>
> *ExecCount*
>
> *User*
>
> *Depth*
>
> *SQL Text*
>
> *Hash Value*
>
> *SQL ID*
>
>
>
> *PlanHashValue*
>
> *1:*
>
> 79.1%
>
> 841.060
>
> 22.094
>
> 0.814
>
> 0.850
>
> 3
>
> 115
>
> 0
>
> SELECT 1*1 TypeEnregistrement ,1* *EMPLH.ID_R ID_*
>
> *45556101* <https://mail.google.com/mail/u/0/#sql298>
>
> 11wythn1bf8c5
>
> 3867173512
>
>
>
> *Call*
>
>
> *Response TimeAccounted-for*
>
>
> *ElapsedTime*
>
> *CPU Time*
>
>
> *Non-IdleWait Time*
>
>
> *Elapsed TimeUnaccounted-for*
>
>
> *IdleWait Time*
>
> *Parse:*
>
> 0.042
>
> 0.032
>
> 0.016
>
> 0.001
>
> 0.016
>
> 0.010
>
> *Execute:*
>
> 7.153
>
> 7.153
>
> 6.906
>
> 0.123
>
> 0.124
>
> 0.000
>
> *Fetch:*
>
> 826.337
>
> 413.562
>
> 0.703
>
> 0.308
>
> 412.551
>
> 412.775
>
> *Total:*
>
> 833.532
>
> 420.747
>
> 7.625
>
> 0.432
>
> 412.690
>
> 412.785
>
>
>
>
> *FetchCall*
>
>
> *Response TimeAccounted-for*
>
>
> *ElapsedTime*
>
> *CPU Time*
>
>
> *Non-IdleWait Time*
>
>
> *Elapsed TimeUnaccounted-for*
>
>
> *IdleWait Time*
>
> *Call Timestamp*
>
> *First:*
>
> 826.080
>
> 413.810
>
> 0.984
>
> 0.306
>
> 412.529
>
> 412.541
>
> MAY-31 10:44:33.191
>
> Rank 1 : 11wythn1bf8c5 3867173512
>
> *Event Name*
>
> *Wait Class*
>
>
> *Non-IdleWait Time*
>
>
>
> *TimesWaitedNon-Idle*
>
>
> *IdleWait Time*
>
>
>
> *TimesWaitedIdle*
>
>
> *AverageWait Time*
>
>
> *MaxWait Time*
>
> *Blocks*
>
>
> *AverageBlocks*
>
> *PX Deq: Parse Reply:*
>
> Idle
>
> 412.533
>
> 64
>
> 6.445834
>
> 85.928264
>
> *enq: PS - contention:*
>
> Other
>
> 0.301
>
> 64
>
> 0.004699
>
> 0.008888
>
> *SQL*Net message from client:*
>
> Idle
>
> 0.246
>
> 56
>
> 0.004394
>
> 0.011639
>
> *db file sequential read:*
>
> User I/O
>
> 0.128
>
> 127
>
> 0.001011
>
> 0.014514
>
> 127
>
> 1
>
> *PX Deq: Join ACK:*
>
> Idle
>
> 0.004
>
> 127
>
> 0.000029
>
> 0.000645
>
> *PX Deq Credit: send blkd:*
>
> Idle
>
> 0.002
>
> 6
>
> 0.000392
>
> 0.000937
>
> *SQL*Net more data to client:*
>
> Network
>
> 0.002
>
> 65
>
> 0.000027
>
> 0.000272
>
> *log file sync:*
>
> Commit
>
> 0.001
>
> 1
>
> 0.000595
>
> 0.000595
>
> *SQL*Net message to client:*
>
> Network
>
> 0.000
>
> 57
>
> 0.000001
>
> 0.000003
>
> *latch: row cache objects:*
>
> Concurrency
>
> 0.000
>
> 4
>
> 0.000010
>
> 0.000020
>
> *SQL*Net more data from client:*
>
> Network
>
> 0.000
>
> 1
>
> 0.000013
>
> 0.000013
>
> *latch: shared pool:*
>
> Concurrency
>
> 0.000
>
> 1
>
> 0.000005
>
> 0.000005
>
> *Total:*
>
> 0.432
>
> 320
>
> 412.785
>
> 253
>
>
>
>
> *FetchCall*
>
>
>
> *RecursiveCallCount*
>
>
>
> *OSBuffer Gets(disk)*
>
>
>
> *BG ConsistentRead Mode(query)*
>
>
>
> *BG CurrentMode(current)*
>
>
>
> *RowsProcessedor Returned*
>
>
>
> *LibraryCacheMisses*
>
>
>
> *TimesWaitedNon-Idle*
>
>
>
> *TimesWaitedIdle*
>
> *Call Timestamp*
>
> *First:*
>
> 3000
>
> 10
>
> 198855
>
> 0
>
> 100
>
> 0
>
> 78
>
> 198
>
> MAY-31 10:44:33.191
>
> *Second:*
>
> 0
>
> 0
>
> 0
>
> 0
>
> 100
>
> 0
>
> 2
>
> 1
>
> MAY-31 10:44:33.197
>
> *Third:*
>
> 0
>
> 0
>
> 0
>
> 0
>
> 100
>
> 0
>
> 2
>
> 1
>
> MAY-31 10:44:33.201
>
> *Last:*
>
> 0
>
> 0
>
> 0
>
> 0
>
> 66
>
> 0
>
> 2
>
> 1
>
> MAY-31 10:44:33.457
>
> overview of the data exchange between the producers/consumer during the
> execution
>
> Username QC/Slave SlaveSet SID Slave INS STATE WAIT_EVENT
> QC SID QC INS Req. DOP Actual DOP
> ------------ -------- -------- ------ --------- --------
> ------------------------------ ------ ------ -------- ----------
> APP1 QC 596 1 WAIT PX Deq: Parse
> Reply 596
> - p00k (Slave) 1 16 1 WAIT PX Deq: Execution
> Msg 596 1 32 32
> - p006 (Slave) 1 62 1 WAIT PX Deq: Execution
> Msg 596 1 32 32
> - p00l (Slave) 1 211 1 WAIT PX Deq: Execution
> Msg 596 1 32 32
> - p00a (Slave) 1 224 1 WAIT PX Deq: Execution
> Msg 596 1 32 32
> - p00m (Slave) 1 384 1 WAIT PX Deq: Execution
> Msg 596 1 32 32
> - p00p (Slave) 1 405 1 WAIT library cache
> lock 596 1 32 32
> - p008 (Slave) 1 421 1 WAIT cursor: pin S
> wait on X 596 1 32 32
> Explain Plan : did not match the the plan used for execution which was
> had parallel processes (adaptive_plan ). Again , I always wanted to ask
> the client to reserve esx cpu and the ram for the vm but lacked concrete
> evidence of it's impact on Oracle parallelism to back it up.
>
>
>
> Thank you very much for reading
>
> Kouss
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jun 11 2019 - 21:04:26 CEST

Original text of this message