RE: PDML and PQ tunning

From: Herring Dave - dherri <>
Date: Mon, 21 Nov 2011 16:41:51 +0000
Message-ID: <>

In an ideal world you'd have some sort of test system where you could run the same query over and over, capturing statistics and checking the impact of various parameter changes to see exactly what is happening and why. I've had clients before where there weren't any test systems, so I understand if you're stuck.

When you captured statistics below with Tanel's "snapper" script, did you limit the SIDs in any way? It's important that with parallel operations you include all the PX slaves, not just the QC.

Can you run any of the queries on your own? If so, enable event 10046 level 12 beforehand at the session level to capture all waits for both your session (QC) and all the PX slaves. That way you can find out exactly what waits there are for the PX slaves as well. Also query V$PQ_TQSTAT immediately after the query finishes, in the same session. Or if the system is idle outside of these queries you could also perform manual statspack snapshots before and after. Also, the xplans would be helpful in being able to see details in the parallelism columns - TQ|IN-OUT|PQ Distrib. I believe under we ran into a CPU patch that changed the PQ distribution for one particular query, causing a huge performance problem.

It'd be nice to know why there are waits on the PX qref latch, meaning there's potentially an imbalance in the speed of procedures vs. consumers, but if you can't get any details or doing testing then you might need to rely on increasing PEMS. Just note that I believe under 9i if you don't set a value for the large pool then PEMS buffers will come out of the shared pool. PARALLEL_AUTOMATIC_TUNING may need to be set as well, I can't remember.

Acxiom Corporation

TEL    630.944.4762
MBL   630.430.5988 

1501 Opus Pl, Downers Grove, IL 60515, USA WWW.ACXIOM.COM The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you.

-----Original Message-----
From: Radoulov, Dimitre [] Sent: Sunday, November 20, 2011 5:09 AM
To: Herring Dave - dherri
Subject: Re: PDML and PQ tunning

Hi Dave,
thanks for replying!

Details in-line.

On 19/11/2011 17:36, Herring Dave - dherri wrote:
> As for your specific situation, you haven't shared a lot of detail.

I know, I did that on purpose because I suspect there is bug involved so I wanted to share only what seemed sufficient in order to briefly describe the issue.
For a full blown analysis I'll need to share everything.

> For PQs, are the underlying tables large enough to warrent parallelization? Smaller tables can actually run slower with parallelization, due to all the overhead of starting x number of PX slaves, dividing up the workload, communication between PX slaves and the QC, then producing the results and cleanup. "Smaller" is obviously relative but from what I've seen in working on DWs, any table smaller than 150 MB to 200 MB usually should be left at degree 1.

Yes, this is a kind of *small* dwh environment (the whole database is about 200g) running on old machines.
The statements I'm currently analyzing join tables which are about 3 - 4g each, the joins are rather simple (mostly 2 tables at a time).

> Have you compared xplans between serial and parallel executions? Were the serial executions using full scans? Was more than 1 table joined and if so were both large and had a parallel degree set?

Yes, I've compared the plans and in the case I'm talking about both plans use the same access type (FTS).
Both the serial and the parallel plans indicate hash join with FTS, I've tried serial, parallel 2 and 4.
In all cases the serial execution was the fastest.

> As for PDML, were you working with CTAS operations? Or INSERT...SELECT? For the latter were you attempting to have the INSERT run in parallel or just the SELECT?

Insert as select, both insert and select parallelized. In most cases even the single select was slower with parallel degree 2 compared to the serial one.

> For both PQ and PDML, were the PX qref latch waits the highest of all waits? If not, what were the top 5 waits before and after the change for both types of operations?

Good questions. I don't have access to the trace files so for now extended SQL trace is not an option for now. I've used Tanel Poder's snapper script to take a *quick look* at the wait events and I've got the following result (just a small part, there is a single select statement running):


    200% | 114550436 | PX qref latch

  • End of ASH snap 2, end=2011-11-18 11:24:45, seconds=20, samples_taken=19


    200% | 114550436 | PX qref latch
      6% | 3463809334 | ON CPU

  • End of ASH snap 3, end=2011-11-18 11:25:05, seconds=20, samples_taken=18


    200% | 114550436 | PX qref latch

  • End of ASH snap 4, end=2011-11-18 11:25:26, seconds=21, samples_taken=19


    200% | 114550436 | PX qref latch

  • End of ASH snap 5, end=2011-11-18 11:25:46, seconds=20, samples_taken=19

I'll admit that I though this was a bug so I hoped to find it documented on MOS quickly and to not spend more time at this time. Now I believe that that's not the case (i.e. I didn't find anything useful for now), so next week I'll try to set up a complete test case and get back here with the details.

Thanks again

Received on Mon Nov 21 2011 - 10:41:51 CST

Original text of this message