Re: Is Parallelism happening at INSERT level?

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Wed, 6 May 2020 17:04:02 +0200
Message-ID: <0aa28b00-1790-3aed-fa36-9f5367c9a5bc_at_bluewin.ch>



Hi,

I wonder if a Bloom Filter could be used on sce. Would that be faster than probing?  I would think so ...
There seems to be many rows not surviving the join.

Regards

Lothar

Am 06.05.2020 um 09:05 schrieb Jonathan Lewis:
> Ram,
>
> As Lothar says, you are doing a parallel insert as well as a parallel
> select.
> You need only 4 slaves to run this one parallel 4 because of the
> optimizer's choice for the plan, which is taking advantage of the 12c
> PQ_REPLICATE() mechanism.
>
> Each slave reads the WHOLE efgh table into an in-memory hash table and
> then scans sections of the sce table to do the probe into its copy of
> the dfgh table, then inserts surviving rows in batches.
>
> If you were on 11g the plan would look more like:
>
> ----------------------------------------------------------------------------------------------------------------------------
> | Id  | Operation                          | Name     | Rows  | Bytes
> | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
> ----------------------------------------------------------------------------------------------------------------------------
> |   0 | INSERT STATEMENT                   |          | |       |  
> 487 (100)|          |        |      |            |
> |   1 |  PX COORDINATOR                    |          | |       |    
>        |          |        |      |            |
> |   2 |   PX SEND QC (RANDOM)              | :TQ10001 | 564K|    75M|
>   487  (11)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
> |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| T2       | |       |    
>        |          |  Q1,01 | PCWP |            |
> |   4 |     OPTIMIZER STATISTICS GATHERING |          | 564K|    75M|
>   487  (11)| 00:00:01 |  Q1,01 | PCWP |    |
> |*  5 |      HASH JOIN RIGHT SEMI          |          | 564K|    75M|
>   487  (11)| 00:00:01 |  Q1,01 | PCWP |    |
> |   6 |       PX RECEIVE                   |          | 56764 |  
> 277K|    42   (5)| 00:00:01 |  Q1,01 | PCWP |            |
> |   7 |        PX SEND BROADCAST           | :TQ10000 | 56764 |  
> 277K|    42   (5)| 00:00:01 |  Q1,00 | P->P | BROADCAST  |
> |   8 |         PX BLOCK ITERATOR          |          | 56764 |  
> 277K|    42   (5)| 00:00:01 |  Q1,00 | PCWC |            |
> |*  9 |          TABLE ACCESS FULL         | T1       | 56764 |  
> 277K|    42   (5)| 00:00:01 |  Q1,00 | PCWP |            |
> |  10 |       PX BLOCK ITERATOR            |          | 567K|    73M|
>   438  (10)| 00:00:01 |  Q1,01 | PCWC |    |
> |* 11 |        TABLE ACCESS FULL           | T0       | 567K|    73M|
>   438  (10)| 00:00:01 |  Q1,01 | PCWP |    |
> ----------------------------------------------------------------------------------------------------------------------------
>
> And this WOULD run with 8 slaves - 2 sets of 4.
>
> As for timing - your figures report 536 GB for sce, to be scanned in
> 1:51.  That's roughly 5 GB per second - with each slave doing 1.25GB
> per second and allowing no time for any of the CPU required to do the
> hash join and write the data.  Is that realistic ?  It seems a little
> unlikely.
> Are any of the direct path reads and write also "temp" - since you've
> got 4 copies of 10GB (estimated) data, it seems likely that the hash
> join is going to spill and end up doing a lot of temp space writes.
>
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Tue, May 5, 2020 at 11:34 PM Ram Raman <veeeraman_at_gmail.com
> <mailto:veeeraman_at_gmail.com>> wrote:
>
> Hi,
> We are trying to tune a query to use parallelism at multiple
> levels. Here is the query along with the plan gotten by using
> dbms_xplan:
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> SQL_ID  1qrcwrva8jgba, child number 0
> -------------------------------------
> insert /*+ ENABLE_PARALLEL_DML APPEND PARALLEL(4) */ into abcd select
> /*+ parallel(4) */ * from user1.sce where esk in
> (select esk from efgh)
>
> Plan hash value: 1996703413
>
> ----------------------------------------------------------------------------------------------------------------------------------------------
> | Id  | Operation                          | Name    | Rows  |
> Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ
> Distrib |
> ----------------------------------------------------------------------------------------------------------------------------------------------
> |   0 | INSERT STATEMENT                   |   |       |       | 
> 2840K(100)|          |       |  |        |      |            |
> |   1 |  PX COORDINATOR                    |   |       |       | 
>           |          |       |  |        |      |            |
> |   2 |   PX SEND QC (RANDOM)              | :TQ10000   |    30M| 
>   10G|  2840K  (1)| 00:01:51 |    |       |  Q1,00 | P->S | QC
> (RAND)  |
> |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)|   |       |       | 
>           |          |       |  |  Q1,00 | PCWP |            |
> |   4 |     OPTIMIZER STATISTICS GATHERING |   |    30M|    10G| 
> 2840K  (1)| 00:01:51 |       |  |  Q1,00 | PCWP |            |
> |*  5 |      HASH JOIN RIGHT SEMI          |   |    30M|    10G| 
> 2840K  (1)| 00:01:51 |       |  |  Q1,00 | PCWP |            |
> |   6 |       TABLE ACCESS FULL            | efgh    |   193K| 
> 1131K|    36   (3)| 00:00:01 |       |    |  Q1,00 | PCWP |       
>     |
> |   7 |       PX BLOCK ITERATOR            |   |  1547M|   536G| 
> 2839K  (1)| 00:01:51 |     1 | 32 |  Q1,00 | PCWC |            |
> |*  8 |        TABLE ACCESS FULL           | sce | 1547M|   536G| 
> 2839K  (1)| 00:01:51 |     1 |    32 | Q1,00 | PCWP |            |
> ----------------------------------------------------------------------------------------------------------------------------------------------
> ......
> Note
> -----
>    - dynamic statistics used: dynamic sampling (level=AUTO)
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    - Degree of Parallelism is 4 because of hint
>
>
> Given that P->S happens in line id 2, is this statement using
> parallelism for the INSERT part of it? I can see a parallelism of
> 4 reported at the v$px_session table for this SQL. Should I see 8
> rows if there is parallelism of 4 for SELECT and INSERT each?
>
> Dont know why it is saying the statement will complete in 1:51. It
> has been running for an hour.  The waits are all on 'direct path
> read/writes'. we have some bad IO, but still finding ways to tune
> the statement. ver is 12c.
> Thanks
> Ram
>
> --
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 06 2020 - 17:04:02 CEST

Original text of this message