Re: Bottleneck in 8i parallelism

From: <aozocak_at_my-deja.com>
Date: Thu, 18 Nov 1999 12:40:04 GMT
Message-ID: <810s34$kao$1@nnrp1.deja.com>

Jonathan,

Thanks for your immediate response, we really need urgent help!!

Here is some information that might help you identify the problem better.
Looking forward to hearing from you.... Regards,
Alper

Query is as follows :

CREATE TABLE CADM_ETT_CALL_TMP_02_TODAY

                      UNRECOVERABLE
                      PARALLEL
                      TABLESPACE CALLSTATS_01
               as SELECT /*+ parallel (CADM_ETT_CALL_TMP_01)
                              parallel (CADM_SUBSCRIBERS)
                           */
                          max(t1.DAY_DATE)             DAY_DATE


                        , max(nvl(t2.SUBSCRIBER_ID,0)) SUBSCRIBER_ID


                        , max(t1.BILL_CYCLE_ID)        BILL_CYCLE_ID


                        , max(t1.ETT_DATE)             ETT_DATE


                        , max(t1.TIME_BAND_ID)         TIME_BAND_ID


                        , max(t1.DISTANCE_BAND_ID)     DISTANCE_BAND_ID


                        , max(t1.TARIFF_ID)            TARIFF_ID


                        , max(t1.EQUIPMENT_ID)         EQUIPMENT_ID


                        , max(t1.ROAMING_ID)           ROAMING_ID


                        , nvl(t1.CALL_TYPE_ID,0)       CALL_TYPE_ID


                        , max(t1.TERM_TYPE_ID)         TERM_TYPE_ID


                        , max(t1.SCHEME_ID)            SCHEME_ID


                        , max(t1.CELL_LOC_ID)          CELL_LOC_ID


                        , max(t1.TRAFFIC_ID)           TRAFFIC_ID


                        , max(t1.RATE_BAND_ID)         RATE_BAND_ID


                        , max(t1.DURATION_BAND_ID)     DURATION_BAND_ID


                        , max(t1.LATE_CALL_FLAG)       LATE_CALL_FLAG


                        , t1.CALL_DATE                 CALL_DATE


                        , t2.PHONE_NO                  A_NO


                        , t1.B_NO


                        , max(t1.QTY_CALLS)           QTY_CALLS


                        , max(t1.QTY_SECS )           QTY_SECS


                        , max(t1.TL_CALL_COST)        TL_CALL_COST


                        , round(max(t1.USD_CALL_COST),3) USD_CALL_COST


                        , max(t1.IMEI_SERIAL)          IMEI_SERIAL


                  FROM CADM_ETT_CALL_TMP_01 t1
                     , CADM_SUBSCRIBERS     t2
                  WHERE t1.CUSTOMER_ID = t2.CUSTOMER_ID (+)
                    AND t1.ACCOUNT_ID  = t2.ACCOUNT_ID  (+)
                    AND t1.LATE_CALL_FLAG = 'N'
                 GROUP BY t1.CALL_TYPE_ID
                         , t1.CALL_DATE
                         , t2.PHONE_NO
                         , t1.B_NO ;

Explain plan for the query :
cadm_ett_call_tmp_02_today = 4739779
cadm_subscribers = 4738679

i p PLAN_PLUS_EXP
OBJECT_N OTHER_CARD_BYTES_COST

---- ---- ------------------------------------------------------------
-------- --------------------------------------------------------
   0      CREATE TABLE STATEMENT
         (4636910,918108180,5117)

1 0 LOAD AS SELECT
:Q533900 PARALLEL_TO_SERIAL (,,)

                                                                       3

   2    1     SORT GROUP BY
:Q533900 PARALLEL_COMBINED_WITH_PARENT (4636910,918108180,5117)
                                                                       3

   3    2       HASH JOIN OUTER
:Q533900^tPARALLEL_TO_SERIAL (4636910,918108180,529)
                                                                       2

   4    3         TABLE ACCESS FULL CADM_ETT_CALL_TMP_01
:Q533900 PARALLEL_TO_SERIAL (4636910,607435210,142)
                                                                       0

   5    3         TABLE ACCESS FULL CADM_SUBSCRIBERS
:Q533900 PARALLEL_TO_SERIAL (4740156,317590452,62)
                                                                       1

Hang definition :

At a very early stage, after some select's are done by the slave processes (up to 800 logical block reads) the process seem to hang, but it is not a real freeze, the process goes on very slowly. When we take a trace of the processes we see some waits like 'PX Deq: Table Q Normal', ' need buffer','send msg' etc. After we wait for 2 to 3 hours the prcocess finishes. During this process, we can access the database thru other sessions, but with great performance degradation.

On UNIX side, there is some 100 % I/O wait on some internal disks, but not on database disks.The whole system is dedicated to this process therefore we don't have any resource contention. As for vmstat, the po rate is quite large, at least 1000 pages/sec, scan rate is non_zero (as below)

procs     memory                   page                     disk
   	   faults             cpu
 r  b  w    swap       free       re  mf   pi    po       fr       de
 	sr       s0    s1     s3 s5   in     sy         cs    us   sy
id
 1 5   0       358720   358008 49 181  730  1954   2149    0   	103
3    22      0     0 768   128075  2331 18    7  75
 0 13 0    7990328    61752   65 192  792  912    1606     0 	584
0    105    0     0 312     665      969  1       1 98
 0 6   0      7988448   62472  48 199  848  907    1532     0    335
5    103     0     0 287     862      881  1       1 98
 0 1   0      7987512   62768  48 221  793  890    1496     0    395
1    102     0     2 335    1728     977  1       1 98
 0 1   0      7986504   62616  47 199  793  924    1412     0    370
3    103     0     1 358    1538   1110 1        1 97
 0 1   0      7985560   61976  51 207  785  1156  1639     0    440    2
   102     0     0 336    1172    982 1         1 97
 0 1   0      7984728   62640  53 189  768  1142  1510     0    364    0
   102     0     1 310    1060    930 1         1 98
 0 1   0      7983440   62208  57 328  1085 1425 1862     0    361    19
  102    0     2 404    1202   1174 1        2 97
 0 1   0      7981360   61696  52 187  787  1045  1354     0    334    1
   103     0     0 353    1432     968 1       1 98
 0 1   0      7980736   61968  47 180  777  1258  1520     0    296    1
  104      0     2 303    927       896  1      1 98

When we have a look at the top 10 process in the system, we see 'fsflush' and some of the slave processes.

Our configuration is :
RAID 5 disks attached with fibre channels. E10000 with 8 CPU's and 4GB of memory.
Database allocated vxfs file systems.
DB. block size = 16K

Database parameters :

Sort area size= 200 MB
Large Pool size= 300 MB
Shared Pool size= 80 MB
Hash area= 100 MB
DB Block buffers= 20000
DB Block size= 16 K
Parallel Threads per CPU= 2

In article <942876324.27312.1.nnrp-03.9e984b29_at_news.demon.co.uk>, "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:

>
> 5 million rows is not very many.
> Why only parallel 2 ?
> What is the nature of the CTAS SQL ?
> Where is the bottleneck ?
> Can you supply a full execution plan.
>
> --
>
> Jonathan Lewis
> Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

>
> aozocak_at_my-deja.com wrote in message <80u8a9$oqk$1_at_nnrp1.deja.com>...
> >Hi,
> >
> >We are implementing a datawarehouse on a 8-CPU domain of a Sun 10000.

We
> >have a performance bottleneck when we tried to load a 5 million entry > >table using 'create table as select'. Our large pad size is 300 MB, and

> >we are submitting 2 parallel tasks.
>
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Thu Nov 18 1999 - 06:40:04 CST