Re: Is Parallelism happening at INSERT level?
Date: Wed, 6 May 2020 17:04:02 +0200
Message-ID: <0aa28b00-1790-3aed-fa36-9f5367c9a5bc_at_bluewin.ch>
Hi,
Am 06.05.2020 um 09:05 schrieb Jonathan Lewis:
> Ram,
>
> As Lothar says, you are doing a parallel insert as well as a parallel
> select.
> You need only 4 slaves to run this one parallel 4 because of the
> optimizer's choice for the plan, which is taking advantage of the 12c
> PQ_REPLICATE() mechanism.
>
> Each slave reads the WHOLE efgh table into an in-memory hash table and
> then scans sections of the sce table to do the probe into its copy of
> the dfgh table, then inserts surviving rows in batches.
>
> If you were on 11g the plan would look more like:
>
> ----------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes
> | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
> ----------------------------------------------------------------------------------------------------------------------------
> | 0 | INSERT STATEMENT | | | |
> 487 (100)| | | | |
> | 1 | PX COORDINATOR | | | |
> | | | | |
> | 2 | PX SEND QC (RANDOM) | :TQ10001 | 564K| 75M|
> 487 (11)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
> | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| T2 | | |
> | | Q1,01 | PCWP | |
> | 4 | OPTIMIZER STATISTICS GATHERING | | 564K| 75M|
> 487 (11)| 00:00:01 | Q1,01 | PCWP | |
> |* 5 | HASH JOIN RIGHT SEMI | | 564K| 75M|
> 487 (11)| 00:00:01 | Q1,01 | PCWP | |
> | 6 | PX RECEIVE | | 56764 |
> 277K| 42 (5)| 00:00:01 | Q1,01 | PCWP | |
> | 7 | PX SEND BROADCAST | :TQ10000 | 56764 |
> 277K| 42 (5)| 00:00:01 | Q1,00 | P->P | BROADCAST |
> | 8 | PX BLOCK ITERATOR | | 56764 |
> 277K| 42 (5)| 00:00:01 | Q1,00 | PCWC | |
> |* 9 | TABLE ACCESS FULL | T1 | 56764 |
> 277K| 42 (5)| 00:00:01 | Q1,00 | PCWP | |
> | 10 | PX BLOCK ITERATOR | | 567K| 73M|
> 438 (10)| 00:00:01 | Q1,01 | PCWC | |
> |* 11 | TABLE ACCESS FULL | T0 | 567K| 73M|
> 438 (10)| 00:00:01 | Q1,01 | PCWP | |
> ----------------------------------------------------------------------------------------------------------------------------
>
> And this WOULD run with 8 slaves - 2 sets of 4.
>
> As for timing - your figures report 536 GB for sce, to be scanned in
> 1:51. That's roughly 5 GB per second - with each slave doing 1.25GB
> per second and allowing no time for any of the CPU required to do the
> hash join and write the data. Is that realistic ? It seems a little
> unlikely.
> Are any of the direct path reads and write also "temp" - since you've
> got 4 copies of 10GB (estimated) data, it seems likely that the hash
> join is going to spill and end up doing a lot of temp space writes.
>
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Tue, May 5, 2020 at 11:34 PM Ram Raman <veeeraman_at_gmail.com
> <mailto:veeeraman_at_gmail.com>> wrote:
>
> Hi,
> We are trying to tune a query to use parallelism at multiple
> levels. Here is the query along with the plan gotten by using
> dbms_xplan:
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> SQL_ID 1qrcwrva8jgba, child number 0
> -------------------------------------
> insert /*+ ENABLE_PARALLEL_DML APPEND PARALLEL(4) */ into abcd select
> /*+ parallel(4) */ * from user1.sce where esk in
> (select esk from efgh)
>
> Plan hash value: 1996703413
>
> ----------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows |
> Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ
> Distrib |
> ----------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | INSERT STATEMENT | | | |
> 2840K(100)| | | | | | |
> | 1 | PX COORDINATOR | | | |
> | | | | | | |
> | 2 | PX SEND QC (RANDOM) | :TQ10000 | 30M|
> 10G| 2840K (1)| 00:01:51 | | | Q1,00 | P->S | QC
> (RAND) |
> | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| | | |
> | | | | Q1,00 | PCWP | |
> | 4 | OPTIMIZER STATISTICS GATHERING | | 30M| 10G|
> 2840K (1)| 00:01:51 | | | Q1,00 | PCWP | |
> |* 5 | HASH JOIN RIGHT SEMI | | 30M| 10G|
> 2840K (1)| 00:01:51 | | | Q1,00 | PCWP | |
> | 6 | TABLE ACCESS FULL | efgh | 193K|
> 1131K| 36 (3)| 00:00:01 | | | Q1,00 | PCWP |
> |
> | 7 | PX BLOCK ITERATOR | | 1547M| 536G|
> 2839K (1)| 00:01:51 | 1 | 32 | Q1,00 | PCWC | |
> |* 8 | TABLE ACCESS FULL | sce | 1547M| 536G|
> 2839K (1)| 00:01:51 | 1 | 32 | Q1,00 | PCWP | |
> ----------------------------------------------------------------------------------------------------------------------------------------------
> ......
> Note
> -----
> - dynamic statistics used: dynamic sampling (level=AUTO)
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> - Degree of Parallelism is 4 because of hint
>
>
> Given that P->S happens in line id 2, is this statement using
> parallelism for the INSERT part of it? I can see a parallelism of
> 4 reported at the v$px_session table for this SQL. Should I see 8
> rows if there is parallelism of 4 for SELECT and INSERT each?
>
> Dont know why it is saying the statement will complete in 1:51. It
> has been running for an hour. The waits are all on 'direct path
> read/writes'. we have some bad IO, but still finding ways to tune
> the statement. ver is 12c.
> Thanks
> Ram
>
> --
>
-- http://www.freelists.org/webpage/oracle-lReceived on Wed May 06 2020 - 17:04:02 CEST