Tracing Parallel Execution

Mark Rittman's picture
articles: 

If you're looking to tune an SQL statement or a batch job, a common way to find out what happened during the execution of the SQL is to run an extended SQL trace and examine the wait events. But what happens if you are using parallel execution, and all your trace file contains is the parallel execution wait events that are generally considered idle events? Your trace file shows how long your query took to run, and the work involved in controlling the PQ slaves, but the real details of what took up all the execution time are actually to be found in the corresponding PQ slave trace files in the BDUMP directory.

To take an example, I create a copy of the SH.SALES table and enable parallel DML with a degree of parallelism of 2:

SQL*Plus: Release 10.1.0.4.0 - Production on Tue Apr 26 19:49:45 2005

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create table sales_copy as select * from sales;

Table created.

SQL> alter table sales_copy parallel 2;

Table altered.

I then query the parallel_min_servers parameter to make sure some PQ slaves are available:

SQL> show parameter parallel_min_servers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_min_servers                 integer     4
SQL> 

Now, I turn on extended SQL tracing for my session:

SQL> alter session set timed_statistics = true;

Session altered.

SQL> alter session set max_dump_file_size = unlimited;

Session altered.

SQL> alter session set tracefile_identifier = 'px_test1';

Session altered.

SQL> alter session set events = '10046 trace name context forever, level 8';

Session altered.

Now, I run my SELECT statement against the SALES_COPY table and the
PRODUCTS table;

SQL> select p.prod_name, sum(s.amount_sold)
  2  from   products p, sales_copy s
  3  where  p.prod_id = s.prod_id
  4  group by p.prod_name;

PROD_NAME                                          SUM(S.AMOUNT_SOLD)
-------------------------------------------------- ------------------
128MB Memory Card                                           571333.75
17' LCD w/built-in HDTV Tuner                              7189171.77
18' Flat Panel Graphics Monitor                            5498727.81
3 1/2' Bulk diskettes, Box of 100                           370204.56
3 1/2' Bulk diskettes, Box of 50                            254128.79
Adventures with Numbers                                     175563.92
CD-R Mini Discs                                             384553.62
CD-R with Jewel Cases, pACK OF 12                           170405.76
CD-RW, High Speed Pack of 5                                 230233.35
CD-RW, High Speed, Pack of 10                               106468.41
DVD-R Discs, 4.7GB, Pack of 5                                904537.2
...
Y Box                                                       2082330.3

71 rows selected.

Then turn off extended SQL trace:

SQL> alter session set events '10046 trace name context off';

Session altered.

Now, if I format the trace file using TKPROF

C:\oracle\admin\markr10g\udump>tkprof markr10g_ora_11184_px_test1.trc markr10g_ora_11184_px_test1.txt

TKPROF: Release 10.1.0.4.0 - Production on Tue Apr 26 20:09:12 2005

Copyright (c) 1982, 2004, Oracle.  All rights reserved.

and then take a look at the part of the trace file that relates to my query, I see the following wait events:

select p.prod_name, sum(s.amount_sold)
from   products p, sales_copy s
where  p.prod_id = s.prod_id
group by p.prod_name

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       1.43        851         87          0           0
Execute      1      0.00       0.07          0          3          0           0
Fetch        6      0.00       3.14          3          4          0          71
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        8      0.01       4.65        854         94          0          71

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61  

Rows     Row Source Operation

-------  ---------------------------------------------------
     71  PX COORDINATOR  (cr=7 pr=3 pw=0 time=3110386 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  HASH JOIN  (cr=0 pr=0 pw=0 time=0 us)
      0   BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
      0    PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
     72  TABLE ACCESS FULL PRODUCTS (cr=4 pr=3 pw=0 time=15278 us)(Orphan Entry)
      0   PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)
      0    TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Join ACK                                3        0.00          0.00
  PX Deq: Parse Reply                             5        0.02          0.04
  SQL*Net message to client                       6        0.00          0.00
  PX Deq: Execute Reply                          61        0.20          2.79
  PX qref latch                                  31        0.02          0.22
  db file sequential read                         1        0.01          0.01
  db file scattered read                          1        0.00          0.00
  SQL*Net message from client                     6      126.37        126.40
  PX Deq: Signal ACK                              4        0.09          0.10
********************************************************************************

Lots of PX 'idle' events, and SQL*Net message to/from client, but not much on what the query actually waited on. If I take a look in my BDUMP directory though, I've got another four trace files, 2 for the processes that scanned my SALES_COPY table and two that did the sorting because of the join.

C:\oracle\admin\markr10g\bdump>dir
 Volume in drive C has no label.
 Volume Serial Number is D47C-241A

 Directory of C:\oracle\admin\markr10g\bdump

26/04/2005  20:03    <DIR>          .
26/04/2005  20:03    <DIR>          ..
26/04/2005  19:42           349,515 alert_markr10g.log
26/04/2005  20:03             2,353 markr10g_p000_11560.trc
26/04/2005  20:03             2,228 markr10g_p001_6892.trc
26/04/2005  20:03             8,857 markr10g_p002_7568.trc
26/04/2005  20:03             8,312 markr10g_p003_4672.trc
               5 File(s)        371,265 bytes
               2 Dir(s)  28,870,373,376 bytes free

Now if I format the P002 trace file using TKPROF and look at the bit relating to my SELECT statement, I get the following output:

select p.prod_name, sum(s.amount_sold)
from   products p, sales_copy s
where  p.prod_id = s.prod_id
group by p.prod_name

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.99       3.03       2246       2269          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.99       3.03       2246       2269          0           0

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 61     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  HASH JOIN  (cr=0 pr=0 pw=0 time=0 us)
      0   BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
      0    PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0   PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)
      0    TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                          28        0.08          0.23
  PX Deq Credit: send blkd                        3        0.00          0.00
  PX Deq: Table Q Normal                          3        0.01          0.01
  PX qref latch                                   2        0.00          0.00
  direct path read                               74        0.13          1.58

Note the 'direct path read' waits. The P003 trace file had the same profile. Also, if I take a look at the P000 and P001 trace files, which presumably are doing the sorting, I get the following profile:

select p.prod_name, sum(s.amount_sold)
from   products p, sales_copy s
where  p.prod_id = s.prod_id
group by p.prod_name

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       3.06          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       3.06          0          0          0           0

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 61     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  HASH JOIN  (cr=0 pr=0 pw=0 time=0 us)
      0   BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
      0    PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0   PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)
      0    TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Execution Msg                           2        2.00          2.92
  PX Deq: Table Q Normal                          3        0.10          0.10
  PX Deq Credit: send blkd                        2        0.00          0.00
  PX qref latch                                   2        0.00          0.01

So what do I do if I need to get the full picture of all the wait events that have slowed my SQL statement down?

I'm using Oracle 10g, and I remembered from a previous article I'd written that DBMS_MONITOR allows us to set a 'client_id' that can be used, along with a new utility called TRCSESS, to 'stitch together' trace files that are part of one logical session. Using DBMS_MONITOR, I should be able to produce a single trace file that contains all the trace data for my SQL statement.

The first step then is to log on again as SH and this time use DBMS_MONITOR

SQL> exec dbms_session.set_identifier('px_test2');

PL/SQL procedure successfully completed.

SQL> alter session set timed_statistics = true;

Session altered.

SQL> alter session set max_dump_file_size = unlimited;

Session altered.

SQL> alter session set tracefile_identifier = 'px_test2';

Session altered.

SQL> dbms_monitor.client_id_trace_enable(client_id=>'px_test2');
SP2-0734: unknown command beginning 'dbms_monit...' - rest of line ignored.
SQL> exec dbms_monitor.client_id_trace_enable(client_id=>'px_test2');

PL/SQL procedure successfully completed.

Note the call to DBMS_SESSION that sets the client_identifier for this session. This client identifier will be used to tag my SQL statements in the SQL trace file. Also, ensure that your user has execute permission on DBMS_MONITOR for this to work.

Now, I run my SELECT statement as before;

SQL> select p.prod_subcategory, sum(s.amount_sold)
  2  from products p, sales_copy s
  3  where p.prod_id = s.prod_id
  4  group by p.prod_subcategory;

PROD_SUBCATEGORY                                   SUM(S.AMOUNT_SOLD)
-------------------------------------------------- ------------------
CD-ROM                                                        3995633
Camcorders                                                  8314815.4
Camera Batteries                                           2143810.36
Camera Media                                               1190971.74
Cameras                                                     6312268.4
Documentation                                              3721699.61
Game Consoles                                               2082330.3
Memory                                                     4948442.05
Monitors                                                   12687899.6
Portable PCs                                               15011642.5
Recordable CDs                                             2075521.27

PROD_SUBCATEGORY                                   SUM(S.AMOUNT_SOLD)
-------------------------------------------------- ------------------
Y Box Games                                                1550229.35
Accessories                                                4057301.26
Bulk Pack Diskettes                                         624333.35
Desktop PCs                                                5635963.08
Home Audio                                                 10537384.2
Modems/Fax                                                 2185350.59
Operating Systems                                          3543725.89
Printer Supplies                                           5244547.32
Recordable DVD Discs                                       1914390.17
Y Box Accessories                                           427571.78

21 rows selected.

and then switch off trace:

SQL> exec dbms_monitor.client_id_trace_disable(client_id=>'px_test2');

PL/SQL procedure successfully completed.

If I take a look at the raw trace file for my main session, the one that requested all the PX slaves, I can see the client ID within the trace entries:

Dump file c:\oracle\admin\markr10g\udump\markr10g_ora_11704_px_test2.trc
Tue Apr 26 20:27:11 2005
ORACLE V10.1.0.4.0 - Production vsnsta=0
vsnsql=13 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU             : 1 - type 586
Process Affinity: 0x00000000
Memory (A/P)    : PH:561M/2047M, PG:1392M/3945M, VA:1665M/2047M
Instance name: markr10g

Redo thread mounted by this instance: 1

Oracle process number: 17

Windows thread id: 11704, image: ORACLE.EXE (SHAD)


*** 2005-04-26 20:27:11.455
*** ACTION NAME:() 2005-04-26 20:27:11.445
*** MODULE NAME:(SQL*Plus) 2005-04-26 20:27:11.445
*** SERVICE NAME:(markr10g) 2005-04-26 20:27:11.445
*** CLIENT ID:(px_test2) 2005-04-26 20:27:11.445
*** SESSION ID:(140.5258) 2005-04-26 20:27:11.445
WAIT #2: nam='reliable message' ela= 556 p1=1869466216 p2=1869427312 p3=1870014168
=====================
PARSING IN CURSOR #2 len=72 dep=0 uid=61 oct=47 lid=61 tim=376623839081 hv=4221193623 ad='6db34444'
BEGIN dbms_monitor.client_id_trace_enable(client_id=>'px_test2'); END;
END OF STMT
EXEC #2:c=10014,e=24730,p=5,cr=1,cu=7,mis=1,r=1,dep=0,og=1,tim=376623839073
WAIT #2: nam='log file sync' ela= 660 p1=15 p2=0 p3=0
WAIT #2: nam='SQL*Net message to client' ela= 4 p1=1413697536 p2=1 p3=0
*** 2005-04-26 20:31:13.192
WAIT #2: nam='SQL*Net message from client' ela= 241736360 p1=1413697536 p2=1 p3=0
=====================
...

And I can find the same in the PX slave trace files,

STAT #1 id=12 cnt=0 pid=7 pos=2 obj=0 op='PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=13 cnt=0 pid=12 pos=1 obj=73335 op='TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)'

*** TRACE DUMP CONTINUED FROM FILE c:\oracle\admin\markr10g\bdump\markr10g_p003_4672.trc ***


*** 2005-04-26 20:31:14.544
*** SERVICE NAME:(markr10g) 2005-04-26 20:31:14.544
*** CLIENT ID:(px_test2) 2005-04-26 20:31:14.544
*** SESSION ID:(150.6373) 2005-04-26 20:31:14.544
WAIT #0: nam='PX Deq: Msg Fragment' ela= 687 p1=268566527 p2=1 p3=0
=====================
PARSING IN CURSOR #1 len=131 dep=1 uid=61 oct=3 lid=61 tim=376866932282 hv=3562105006 ad='6db3f074'
select p.prod_subcategory, sum(s.amount_sold)
from products p, sales_copy s
where p.prod_id = s.prod_id
group by p.prod_subcategory
END OF STMT
PARSE #1:c=0,e=171,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=376866932270
WAIT #1: nam='PX qref latch' ela= 2777 p1=1 p2=1 p3=1849357532

Note that the SQL statement before the client ID was recorded was for my previous session when I didn't use a client ID.

Now, I can use the TRCSESS utility to create another trace file that contains all the SQL statements carried out by the 'px_test2' client ID. Where there are common SQL statements, such as our SELECT statement above, TKPROF will amalgamate them into a single statement that has multiple executions. TRCSESS is
found in the /bin directory along with TKPROF.

C:\oracle\admin\markr10g\udump>trcsess output = 'markr10g_px_test2_full.trc' 
clientid = 'px_test2' markr10g_ora_11704_px_test2.trc ../bdump/*.trc

Note here that I've included my main trace file, and all the PX slave trace files in the BDUMP directory, as all of them could contain statements tagged with my client ID.

Now, If I format the 'markr10g_px_test2_full.trc' trace file with TKPROF and take a look at my SELECT statement, I get a view of all of the waits that contributed to my statement execution time.

select p.prod_subcategory, sum(s.amount_sold)
from products p, sales_copy s
where p.prod_id = s.prod_id
group by p.prod_subcategory

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        5      0.00       0.04          0          1          0           0
Execute      5      1.75      22.13       4433       4479          0           0
Fetch        3      0.02       5.58          0          4          0          21
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       13      1.77      27.75       4433       4484          0          21

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0  HASH JOIN  (cr=0 pr=0 pw=0 time=0 us)
      0   BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
      0    PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us)
      0  TABLE ACCESS FULL PRODUCTS (cr=0 pr=0 pw=0 time=0 us)(Orphan Entry)
      0   PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us)
      0    TABLE ACCESS FULL SALES_COPY (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX qref latch                                  29        0.03          0.23
  PX Deq: Execution Msg                          54        2.00         11.43
  PX Deq: Table Q Normal                          6        2.00          2.01
  PX Deq Credit: send blkd                       12        1.99          3.97
  direct path read                              145        0.23          4.27
  PX Deq: Parse Reply                             5        0.12          0.14
  SQL*Net message to client                       3        0.00          0.00
  PX Deq: Execute Reply                          55        0.33          3.14
  SQL*Net message from client                     3       89.12         89.13
  PX Deq: Signal ACK                              7        0.09          0.20
********************************************************************************

You will see that the Parse and Execute counts for the statement are now 5, reflecting the fact that the main session then called on 4 PX slaves (2 to scan the SALES_COPY table, 2 to sort the results for the join). There are also a couple of points to note with this technique:

  • Firstly, the elapsed time figure is the sum of all the elapsed times for the main session and the PX slaves, i.e. it won't reflect the actual time that the query took to execute. For this figure, you need to use the elapsed time in the main process trace file.
  • This is an import one. If you keep executing the same query, with the same client ID, and then use TRCSESS to amalgamate the results, the PX slave trace files will still contain entries for this client ID and this statement from previous executions. This is because the PX slave sessions sit there 'permanently' and wait for parallel queries to run, and will write to the same trace file irrespective of the session ID of the calling session. What this means in reality is that an amalgamated trace file created on say the 10th execution of a statement for a particular client ID will contain all the accumulated waits, execution parse and fetch counts, and elapsed time, for all the previous PX executions of this query. The workaround for this is to ensure that your client ID is unique each execution, perhaps by just adding a number to it each time, either manually or via a sequence.

If you're looking for specific details on the waits that a particular PX slave encountered when executing a query, as opposed to the amalgamated (1 x main) + (4 x PX slaves) details, you can also run TRCSESS on a single PX slave trace file, to extract just those statements that relate to a particular client ID. To do this, run TRCSESS against the particular PX slave trace file, like this:

C:\oracle\admin\markr10g\udump>trcsess output = 'markr10g_p002_7568_full.trc' 
clientid = 'px_test2' markr10g_p002_7568.trc

I've used this a couple of times now and it seems to work for me. If I've missed anything or misinterpreted a trace feature, drop me a line and I'll incorporate the feedback.

Comments

Vow .. just what i was looking for!! Wonderful! thanks Mark!!