Re: Bottleneck in 8i parallelism

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 18 Nov 1999 19:22:02 -0000
Message-ID: <942953012.1821.3.nnrp-12.9e984b29@news.demon.co.uk>

Judging from the stats on the execution plan, you are hashing a pair of 5 million row tables together, with a throughput
of 1.5 Gb of messages across the two
table queues.

Since the default message size for
parallel execution is 2K, you are
probably suffering from an excess of
message passing.

Try:

Running parallel 4 and
Increasing the parallel_execution_message_size to 16K

(the name may be only an approximation).

How much memory is there in the whole box, by the way ? You will be using a lot of it with the combined PQ slave SORTs and HASHES.

Also, the next time you run, look at the contents (in the session) of v$pq-tqstat to see how many messages, bytes, and waits you have had in each PQ slave in each queue. (See my web site for a convenient script - there are a couple of aging articles about PQO).

Jonathan Lewis
Yet another Oracle-related web site: http://www.jlcomp.demon.co.uk

aozocak_at_my-deja.com wrote in message <810s34$kao$1_at_nnrp1.deja.com>...
>Jonathan,
>
>Thanks for your immediate response, we really need urgent help!!
>
>Here is some information that might help you identify the problem
>better.
>Looking forward to hearing from you....
>Regards,
>Alper
>
>Query is as follows :
>
>CREATE TABLE CADM_ETT_CALL_TMP_02_TODAY
> UNRECOVERABLE
> PARALLEL
> TABLESPACE CALLSTATS_01
> as SELECT /*+ parallel (CADM_ETT_CALL_TMP_01)
> parallel (CADM_SUBSCRIBERS)
> */
> max(t1.DAY_DATE) DAY_DATE
> , max(nvl(t2.SUBSCRIBER_ID,0)) SUBSCRIBER_ID
> , max(t1.BILL_CYCLE_ID) BILL_CYCLE_ID
> , max(t1.ETT_DATE) ETT_DATE
> , max(t1.TIME_BAND_ID) TIME_BAND_ID
> , max(t1.DISTANCE_BAND_ID) DISTANCE_BAND_ID
> , max(t1.TARIFF_ID) TARIFF_ID
> , max(t1.EQUIPMENT_ID) EQUIPMENT_ID
> , max(t1.ROAMING_ID) ROAMING_ID
> , nvl(t1.CALL_TYPE_ID,0) CALL_TYPE_ID
> , max(t1.TERM_TYPE_ID) TERM_TYPE_ID
> , max(t1.SCHEME_ID) SCHEME_ID
> , max(t1.CELL_LOC_ID) CELL_LOC_ID
> , max(t1.TRAFFIC_ID) TRAFFIC_ID
> , max(t1.RATE_BAND_ID) RATE_BAND_ID
> , max(t1.DURATION_BAND_ID) DURATION_BAND_ID
> , max(t1.LATE_CALL_FLAG) LATE_CALL_FLAG
> , t1.CALL_DATE CALL_DATE
> , t2.PHONE_NO A_NO
> , t1.B_NO
> , max(t1.QTY_CALLS) QTY_CALLS
> , max(t1.QTY_SECS ) QTY_SECS
> , max(t1.TL_CALL_COST) TL_CALL_COST
> , round(max(t1.USD_CALL_COST),3) USD_CALL_COST
> , max(t1.IMEI_SERIAL) IMEI_SERIAL
> FROM CADM_ETT_CALL_TMP_01 t1
> , CADM_SUBSCRIBERS t2
> WHERE t1.CUSTOMER_ID = t2.CUSTOMER_ID (+)
> AND t1.ACCOUNT_ID = t2.ACCOUNT_ID (+)
> AND t1.LATE_CALL_FLAG = 'N'
> GROUP BY t1.CALL_TYPE_ID
> , t1.CALL_DATE
> , t2.PHONE_NO
> , t1.B_NO ;
>
>
>Explain plan for the query :
>cadm_ett_call_tmp_02_today = 4739779
>cadm_subscribers = 4738679
>
> i p PLAN_PLUS_EXP
>OBJECT_N OTHER_CARD_BYTES_COST
>---- ---- ------------------------------------------------------------
>-------- --------------------------------------------------------
> 0 CREATE TABLE STATEMENT
> (4636910,918108180,5117)
> 1 0 LOAD AS SELECT
>:Q533900 PARALLEL_TO_SERIAL (,,)
> 3
>
> 2 1 SORT GROUP BY
>:Q533900 PARALLEL_COMBINED_WITH_PARENT (4636910,918108180,5117)
> 3
>
> 3 2 HASH JOIN OUTER
>:Q533900^tPARALLEL_TO_SERIAL (4636910,918108180,529)
> 2
>
> 4 3 TABLE ACCESS FULL CADM_ETT_CALL_TMP_01
>:Q533900 PARALLEL_TO_SERIAL (4636910,607435210,142)
> 0
>
> 5 3 TABLE ACCESS FULL CADM_SUBSCRIBERS
>:Q533900 PARALLEL_TO_SERIAL (4740156,317590452,62)
> 1
>
>Hang definition :
>
>At a very early stage, after some select's are done by the slave
>processes (up to 800 logical block reads) the process seem to hang, but
>it is not a real freeze, the process goes on very slowly. When we take
>a trace of the processes we see some waits like 'PX Deq: Table Q
>Normal', ' need buffer','send msg' etc. After we wait for 2 to 3 hours
>the prcocess finishes. During this process, we can access the database
>thru other sessions, but with great performance degradation.
>
>On UNIX side, there is some 100 % I/O wait on some internal disks, but
>not on database disks.The whole system is dedicated to this process
>therefore we don't have any resource contention. As for vmstat, the po
>rate is quite large, at least 1000 pages/sec, scan rate is non_zero (as
>below)
>
>procs memory page disk
> faults cpu
> r b w swap free re mf pi po fr de
> sr s0 s1 s3 s5 in sy cs us sy
>id
> 1 5 0 358720 358008 49 181 730 1954 2149 0 103
>3 22 0 0 768 128075 2331 18 7 75
> 0 13 0 7990328 61752 65 192 792 912 1606 0 584
>0 105 0 0 312 665 969 1 1 98
> 0 6 0 7988448 62472 48 199 848 907 1532 0 335
>5 103 0 0 287 862 881 1 1 98
> 0 1 0 7987512 62768 48 221 793 890 1496 0 395
>1 102 0 2 335 1728 977 1 1 98
> 0 1 0 7986504 62616 47 199 793 924 1412 0 370
>3 103 0 1 358 1538 1110 1 1 97
> 0 1 0 7985560 61976 51 207 785 1156 1639 0 440 2
> 102 0 0 336 1172 982 1 1 97
> 0 1 0 7984728 62640 53 189 768 1142 1510 0 364 0
> 102 0 1 310 1060 930 1 1 98
> 0 1 0 7983440 62208 57 328 1085 1425 1862 0 361 19
> 102 0 2 404 1202 1174 1 2 97
> 0 1 0 7981360 61696 52 187 787 1045 1354 0 334 1
> 103 0 0 353 1432 968 1 1 98
> 0 1 0 7980736 61968 47 180 777 1258 1520 0 296 1
> 104 0 2 303 927 896 1 1 98
>
>When we have a look at the top 10 process in the system, we see
>'fsflush' and some of the slave processes.
>
>
>Our configuration is :
>RAID 5 disks attached with fibre channels.
>E10000 with 8 CPU's and 4GB of memory.
>Database allocated vxfs file systems.
>DB. block size = 16K
>
>Database parameters :
>
>Sort area size= 200 MB
>Large Pool size= 300 MB
>Shared Pool size= 80 MB
>Hash area= 100 MB
>DB Block buffers= 20000
>DB Block size= 16 K
>Parallel Threads per CPU= 2
>
>
>
>
>In article <942876324.27312.1.nnrp-03.9e984b29_at_news.demon.co.uk>,
> "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
>>
>> 5 million rows is not very many.
>> Why only parallel 2 ?
>> What is the nature of the CTAS SQL ?
>> Where is the bottleneck ?
>> Can you supply a full execution plan.
>>
>> --
>>
>> Jonathan Lewis
>> Yet another Oracle-related web site: http://www.jlcomp.demon.co.uk
>>
>> aozocak_at_my-deja.com wrote in message <80u8a9$oqk$1_at_nnrp1.deja.com>...
>> >Hi,
>> >
>> >We are implementing a datawarehouse on a 8-CPU domain of a Sun 10000.
>We
>> >have a performance bottleneck when we tried to load a 5 million entry
>> >table using 'create table as select'. Our large pad size is 300 MB,
>and
>> >we are submitting 2 parallel tasks.
>>
>>
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
Received on Thu Nov 18 1999 - 13:22:02 CST