WorkArea allocation and/or Tuning for a GROUP BY

From: Hemant K Chitale <hkchital_at_singnet.com.sg>
Date: Thu, 27 May 2010 23:36:05 +0800
Message-Id: <201005271536.o4RFa0Lo031250_at_smtp42.singnet.com.sg>


Oracle 10.2.0.4 32bit Linux
P_A_T set to 2G

We have a particular SQL operation where we need to add values from two different tables and insert them into a target table.
(Thus, for every "key combination", 40 different columns hold
measures which need to be added together --- assume that for a particular "key combination" one table has values for April-2010 and the other has values for March-2010 and we need to a) Add values in 25 of the columns
b) Compute the average in the other 15 columns

The "key combination" itself is another 20columns.

This is done with a UNION operation (we don't specify UNION ALL) as :

INSERT /*+ PARALLEL (t 4) APPEND */ INTO Target_Table SELECT /*+ PARALLEL (S 4) */ c1, c2, c3, c4, SUM(c5), SUM(c7), sum(c7), sum(c8)/2, sum(c9)/2 ...
FROM
(select /*+ PARALLEL (a 2) */ c1, c2, c3, c4, c5, ... from temp_source_1 a

     UNION ALL
select /*+ PARALLEL (b 2) */ c1, c2, c3, c4, c5, ... from temp_source_2 b) s GROUP BY c1, c2, c3, c4

(thus, c1 to c4 are the 20 "key combination" columns and c5 to c9
are the 40 columns where I have to add the values from the two source tables or compute averages)

The two source tables (temp_source_1 and temp_source_2) have about 10million rows each and are about 7GB in size.
(yes, the rowsize is very large at about 650bytes per row). These
are NOT GTTs.

  I explicitly set table_stats for numrows, numblks, avgrlen (to avoid having to run a gather_stats on these two tables).

Yet, I see these statistics :

SQL> select * from v$sql_workarea_active where sql_id = 'abcdefghijklmn';

SQL_HASH_VALUE SQL_ID        WORKAREA_ADDRESS 
OPERATION_TYPE                                               OPERATION_ID
-------------- ------------- ---------------- 
------------------------------------------------------------ ------------
POLICY                    SID  QCINST_ID      QCSID ACTIVE_TIME 
WORK_AREA_SIZE EXPECTED_SIZE ACTUAL_MEM_USED MAX_MEM_USED
------------------ ---------- ---------- ---------- ----------- 
-------------- ------------- --------------- ------------
NUMBER_PASSES TEMPSEG_SIZE TABLESPACE                       SEGRFNO#    SEGBLK#
------------- ------------ ------------------------------ ---------- ----------
     2538664469 abcdefghijklmn 0000000430FCDB80 GROUP BY 

(HASH) 6
AUTO 89 1 22 150064850 3086336 3134464 3318784 3318784 0 1283457024 TEMP_TBS 4 9 2538664469 abcdefghijklmn 0000000430FCDB80 GROUP BY
(HASH) 6
AUTO 41 1 22 150065009 3086336 3134464 3318784 3318784 0 1275068416 TEMP_TBS 6 9 2538664469 abcdefghijklmn 0000000430FCDB80 GROUP BY
(HASH) 6
AUTO 60 1 22 150065714 3086336 3134464 3318784 3318784 0 1283457024 TEMP_TBS 2 9 2538664469 abcdefghijklmn 0000000430FCDB80 GROUP BY
(HASH) 6
AUTO 119 1 22 150064863 3086336 3134464 3318784 3318784 0 1283457024 TEMP_TBS

TEMPSEG_SIZE rapidly grows to the whole 27GB temp tablespace. (You can see it being 1.28GB for each of the 4 PQ operators doing the GROUP BY (HASH) in the query above while monitoring that job). However, WORK_AREA_SIZE, EXPECTED_SIZE, ACTUAL_MEM_USED, MAX_MEM_USED are all about 3MB. Is there any way I can get the GROUP BY (HASH) to use more Memory -- to have it start with a larger Expected_Size and then a high ACTUAL_MEM_USED and MAX_MEM_USED ? I have seen that a HASH JOIN operation between the same two tables for the same set of rows *does* work with EXPECTED_SIZE of 560MB and ACTUAL_MEM_USED of about 620MB per PQ operator.

Hemant K Chitale

http://hemantoracledba.blogspot.com

--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 27 2010 - 10:36:05 CDT

Original text of this message