RE: PDML and PQ tunning
Date: Wed, 23 Nov 2011 14:45:51 +0000
Message-ID: <BD475CE0B3EE894DA0CAB36CE2F7DEB4454C9E95_at_LITIGMBCRP02.Corp.Acxiom.net>
Dimitre,
Increasing the right memory parameters may be your best bet, as it seems you've already planned to do so. With parallel operations, you typically need more memory as each PX slave needs memory. Parallel operations work best when you can throw resources at it, pretty much a brute force method. It'd be interesting to see what the PGA advisor says in a statspack report, as in how much of an increase would help, if any.
Sqlplus ARRAYSIZE usually doesn't help much past a value of 300, at least in my testing. And that's only for dealing with results returned to your session. In the production setting, are they really going to be running this query with the expectation of returning 400 MB or so of data (500,000+ rows) to their PC? If not then try to rewrite the query in tests to return no rows yet leave the plan the same. For example, make the main query just an inline view with the outer-query performing some aggregation so only a few rows are returned. Obviously you'll need to run a number of EXPLAIN PLAN FOR ... attempts to make sure the plans haven't changed, as Oracle may optimize the query because of the aggregation.
If the goal really is to return 500,000+ rows to a client's PC, then parallelism won't help as you're still limited by sending data to your PC, so in a sense you're just making more processes wait.
DAVID HERRING
DBA
Acxiom Corporation
EML dave.herring_at_acxiom.com TEL 630.944.4762 MBL 630.430.5988
1501 Opus Pl, Downers Grove, IL 60515, USA WWW.ACXIOM.COM The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you.
-----Original Message-----
From: Radoulov, Dimitre [mailto:cichomitiko_at_gmail.com]
Sent: Tuesday, November 22, 2011 10:32 AM
To: Herring Dave - dherri
Cc: oracle-l_at_freelists.org
Subject: Re: PDML and PQ tunning
Hi Dave,
comments and replies in-line.
On 21/11/2011 17:41, Herring Dave - dherri wrote:
> In an ideal world you'd have some sort of test system where you could run the same query over and over, capturing statistics and checking the impact of various parameter changes to see exactly what is happening and why. I've had clients before where there weren't any test systems, so I understand if you're stuck.
Actually there is a test environment: the HW is quite similar, it's only
that there are 27 (yes, 27) active Oracle instances on the test machine.
With 200m of buffer cache and 25m for PGA it's rather difficult to
analyze the execution time.
With manual workarea_size_policy and 60m for sort/hash_areas I got the
following error:
ORA-04030: out of process memory when trying to allocate 1033216 bytes (hash-join subh,kllcqas:kllsltba)
> When you captured statistics below with Tanel's "snapper" script, did you limit the SIDs in any way? It's important that with parallel operations you include all the PX slaves, not just the QC.
Yes, my session was the only one when I tested it, but I'll repeat the test to get more details.
>
> Can you run any of the queries on your own? If so, enable event 10046 level 12 beforehand at the session level to capture all waits for both your session (QC) and all the PX slaves. That way you can find out exactly what waits there are for the PX slaves as well.
Unfortunately, as I said, I have rather limited access to the machine,
so for now I cannot read the trace files. Sorry for the incomplete
information on the wait events
(actually, the most important detail ...), but right now I can run
only a limited sql on the production database ...
Some details:
t1
num rows: 3.008.738
size mb: 3.776
t2
num rows: 2.494.090
size mb: 3.008
Instance configuration:
shared pool 208m
db_cache_size 800m (I've requested to bump it at least to 1.5g, there are 4 other instances on the machine and given the memory currently available, at this point 1.5g seems reasonable ...)
large pool 32m
db_file_multiblock_read_count 16
parallel_automatic_tuning false
N.B. Parallel execution allocates buffers out of the large pool only when PARALLEL_AUTOMATIC_TUNING is set to true.
workarea_size_policy auto
pga_aggregate_target 100m
So I run the statement in serial, parallel 2 and parallel 4. Snapping
v$session_wait for the session I'm currently investigating (QC and QX).
With parallel 2 there was mostly
direct path read (I've already requested to modify the PGA from 100 to
1024m) and PX Deq: Execution Msg/PX Deq Table Q Normal. With parallel 4
there were
PX qref latch, PX Deq: Execute Reply and PX Deq: Table Q Normal:
select
_long_list_of_columns_from_both_tables_
from
t1 A
,t2 b
where
t1.col1 = t2.col1 ( + )
and
t1.col2 = t2.col2 ( + )
and
t2.col1 is null;
- first run, no parallel --
514654 rows selected.
Elapsed: 00:17:16.84
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=153978 Card=3008556 Bytes=3688489656) 1 0 FILTER 2 1 HASH JOIN (OUTER) 3 2 TABLE ACCESS (FULL) OF 't1' (Cost=23081 Card= 3008556 Bytes=3393651168) 4 2 TABLE ACCESS (FULL) OF 't2' (Cost=18506 Ca rd=2493915 Bytes=244403670)
Statistics
0 recursive calls 0 db block gets 432452 consistent gets 735840 physical reads 0 redo size 403453605 bytes sent via SQL*Net to client 382569 bytes received via SQL*Net from client 34312 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 514654 rows processed
- second run, parallel 2 --
514654 rows selected.
Elapsed: 00:17:56.50
Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=59815 Card=3008556 B ytes=3688489656) 1 0 FILTER* :Q572800 2 2 1 HASH JOIN* (OUTER) :Q572800 2 3 2 TABLE ACCESS* (FULL) OF 't1' (Cost=11541 Card :Q572800 =3008556 Bytes=3393651168) 0 4 2 TABLE ACCESS* (FULL) OF 't2' (Cost=9253 Ca :Q572800 rd=2493915 Bytes=244403670) 1 1 PARALLEL_COMBINED_WITH_CHILD 2 PARALLEL_TO_SERIAL SELECT /*+ ORDERED NO_EXPAND USE_HASH(A2) */ A1.C0,A2.C0,A1.C1,A1.C2,A1.C3,A1.C4 3 PARALLEL_TO_PARALLEL SELECT /*+ NO_EXPAND ROWID(A1) */ A1."c1 " C0,A1."c2" C1,A1."c3 4 PARALLEL_TO_PARALLEL SELECT /*+ NO_EXPAND ROWID(A1) */ A1."c1 OLA" C0,A1."c2" C1,A1."c3
Statistics
4184 recursive calls 4 db block gets 1665697 consistent gets 720103 physical reads 856 redo size 403563666 bytes sent via SQL*Net to client 382604 bytes received via SQL*Net from client 34312 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 514654 rows processed
I've already tried with different values for sqlplus arraysize (150 - 500) and multi block read count (128), the impact on the execution time was minimal.
At 16:39:16 I run the query with parallel 4. At 17:21:06 I run the following statement a few times, the output was something like this:
1 select sid, EVENT,P1TEXT,P1RAW,P2TEXT, P2RAW, WAIT_TIME,STATE
2 from v$session_wait
3 where sid in (20, 24, 15, 18, 22)
4* order by WAIT_TIME
SID EVENT P1TEXT P1RAW P2TEXT P2RAW WAIT_TIME STATE ---------- ---------- ---------- ---------------- ---------- ---------------- ---------- ------------------- 18 PX qref la function 0000000000000001 sleeptime 0000000000000064 0 WAITING tch 24 PX qref la function 0000000000000001 sleeptime 0000000000000064 0 WAITING tch 22 PX Deq: Ex sleeptime/ 00000000000000C8 passes 000000000000001F 0 WAITING ecute Repl senderid y 15 PX Deq: Ta sleeptime/ 00000000000000C8 passes 0000000000000001 0 WAITING ble Q Norm senderid al 20 PX Deq: Ta sleeptime/ 00000000000000C8 passes 0000000000000002 0 WAITING ble Q Norm senderid al
Then I hit ctrl-c and I interrupted the statement in paralle 4:
ERROR at line 218:
ORA-01013: user requested cancel of current operation
Elapsed: 00:42:58.41 ^^^^^
I'll try to provide greater details later.
> Also query V$PQ_TQSTAT immediately after the query finishes, in the same session. Or if the system is idle outside of these queries you could also perform manual statspack snapshots before and after.
> Also, the xplans would be helpful in being able to see details in the parallelism columns - TQ|IN-OUT|PQ Distrib. I believe under 9.2.0.6 we ran into a CPU patch that changed the PQ distribution for one particular query, causing a huge performance problem.
I'll try this the next time I have access to the machine.
> It'd be nice to know why there are waits on the PX qref latch, meaning there's potentially an imbalance in the speed of procedures vs. consumers, but if you can't get any details or doing testing then you might need to rely on increasing PEMS. Just note that I believe under 9i if you don't set a value for the large pool then PEMS buffers will come out of the shared pool. PARALLEL_AUTOMATIC_TUNING may need to be set as well, I can't remember.
Thanks!
Best regards
Dimitre
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Nov 23 2011 - 08:45:51 CST