Re: Need design suggestion , whether index organized table is good choice here

From: Lothar Flatz <l.flatz_at_bluewin.ch>
Date: Mon, 31 Jan 2022 10:15:47 +0100
Message-ID: <7c2418d3-57b8-e56a-3f15-6606cbf7c982_at_bluewin.ch>



Guys,

seriously. I copy the Monitor of the that includes the FTS on REF_TAB in again. Maybe you see something that I don't. In line 1 we see 52.94 of the total activity. We have 700 samples Cpu and 11+46 temp space samples.
It would seem to me that we have an CPU issue much more than a temp issue. That is anything but a surprise. Hash join Cpu is often the weak spot of an Exa execution.
So how can we improve that plan? The point is that 115 M rows coming in from the intermediate result. That is not that much. But I have 2G rows from REF_TAB.
When combining big tables Hash Join is generally much more efficient than Nested Loop join. There are corner cases where it looks different, but we already tried Nested Loop and it proved much slower.

It it were possible to get a bloom filter on REF_TAB the storage cells could pre filter REF_TAB potentially retrieving a lot less that 2G rows which would shrink the join in Line 1.
I can not go into a research now about what stops and what provokes bloom filter from happening. Jonathan I believe has much more knowledge about that.

I would make this execution parallel. If it takes 25 minutes now, it can take less than 5 minutes parallel 16. Is that not good enough? Not sure what time you need.
Maybe we get a bloom filter when the execution is parallel too. The other thing I could think of is materialize the Result of step 2 - 11. I would use a WITH .. /*+ Materialize */ for that. Maybe there is a bigger chance to get a bloom filter build when we enforce a scan over the intermediate result.
No guarantee for that, it is a mere guess. It could even be slower since writing 115M rows will take its time.

Thanks

Lothar

Global Information


  Status                                 :  DONE (ALL ROWS)
  Instance ID                            : 4
  SQL Execution ID                       : 67108975
  Execution Started                      :  01/27/2022 11:30:51
  First Refresh Time                     :  01/27/2022 11:30:54
  Last Refresh Time                      :  01/27/2022 11:55:18
  Duration                               : 1467s
  Fetch Calls                            : 22993

Global Stats



| Elapsed |   Cpu   |    IO    | Application | Concurrency | Cluster  | 
Other   | Fetch | Buffer | Read | Read  | Write | Write |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   | Waits(s) |
Waits(s) | Calls |  Gets  | Reqs | Bytes | Reqs  | Bytes | Offload |

|    1097 |     983 |      108 |        0.30 |        0.00 | 0.64 |    
5.69 | 22993 |    41M |   1M | 373GB |  837K |  96GB |  15.97% |

SQL Plan Monitoring Details (Plan Hash Value=2504035735)



| Id |             Operation             | Name           |  Rows   |
Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write |  Cell   |  Mem  | Temp  | Activity |           Activity Detail            |
|    | |                          | (Estim) |       | Active(s) | Active
|       | (Actual) | Reqs  | Bytes | Reqs  | Bytes | Offload | (Max) |
(Max) |   (%)    |             (# samples)              |

|  0 | SELECT STATEMENT |                          |         |      
|      1357 | +111 |     1 |     115M |       |       |       |
|         |       |       |     8.46 | Cpu
(121)                            |

|  1 |   HASH JOIN OUTER |                          |      1M |    8M
|      1357 | +111 |     1 |     115M |  772K |  88GB |  772K |  88GB
|         |  795M |  101G |    52.94 | Cpu
(700)                            |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | direct path read temp (11)           |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | direct path write temp (46)          |
|  2 |    VIEW |                          |      1M |    2M |       129
| +111 |     1 |     115M |       |       |       | |         |      
|       |     1.05 | Cpu (15)                             |
|  3 |     HASH JOIN OUTER |                          |      1M |    2M
|       237 | +3 |     1 |     115M | 65196 |   7GB | 65196 |   7GB | | 
821M |    9G |     8.88 | Cpu (78)                             |

|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | direct path read temp (3)            |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | direct path write temp (46)          |
|  4 |      HASH JOIN |                          |      1M |  765K
|       101 | +3 |     1 |     115M |       |       |       |       |
|    1M |       |     0.35 | Cpu (5)                              |
|  5 |       JOIN FILTER CREATE          | :BF0000                  |   
5662 | 66073 |         1 |     +3 |     1 |     3339 |       |       

|       |       | |       |       | |                                      |
|  6 |        TABLE ACCESS STORAGE FULL  | STAGE_TAB                |   
5662 | 66073 |         1 |     +3 |     1 |     3339 |       |      
|       |       | |       |       | |                                      |
|  7 |       JOIN FILTER USE             | :BF0000                  |   
307M |  698K |       101 |     +3 |     1 |     115M |       |      
|       |       | |       |       | |                                      |
|  8 |        PARTITION RANGE SINGLE |                          |   
307M |  698K |       101 | +3 |     1 |     115M |       |       |      
|       | |       |       | |                                      |
|  9 |         TABLE ACCESS STORAGE FULL | TRAN_TAB                 |   
307M |  698K |       102 |     +2 |     1 |     115M | 41288 |  34GB
|       |       |  81.27% |    7M |       |     0.35 | Cpu
(4)                              |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | cell smart table scan (1)            |
| 10 |      PARTITION RANGE SINGLE |                          |     18M
|  933K |         9 | +103 |     1 |      19M |       |       |       |
|         |       |       | |                                      |
| 11 |       TABLE ACCESS STORAGE FULL   | TRAN_TAB2               
|     18M |  933K |         9 |   +103 |     1 |      19M | 72012 | 
69GB |       |       |  98.54% |    7M |       |     0.07 | Cpu (1)                              |
| 12 |    TABLE ACCESS STORAGE FULL      | REF_TAB                 
|      1G |    3M |       481 |   +238 |     1 |       2G |  261K |
158GB |       |       |  38.27% |    7M |       |     3.15 | enq: KO - fast object checkpoint (1) |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | Cpu (40)                             |
|    | |                          |         |       | |        |      
|          |       |       |       | |         |       |      
|          | cell smart table scan (4)            |

Predicate Information (identified by operation id):


    1 - access("TRAN_TAB2"."ANBR"="REF_TAB"."COL1"(+))
    3 - access("TRAN_TAB"."PART_DATE1"="TRAN_TAB2"."PART_DATE1"(+) AND 
"TRAN_TAB"."TX_ID"="TRAN_TAB2"."TX_ID"(+))
    4 - access("TRAN_TAB"."SBID"="SFID")
    6 - storage("ASP_NM"=:B2 AND
"PART_DATE"=TO_DATE(:B1,'DD-MON-YYYY'))   filter("ASP_NM"=:B2 AND "PART_DATE"=TO_DATE(:B1,'DD-MON-YYYY'))
    9 - storage("TRAN_TAB"."PART_DATE1"=TO_DATE(:B1,'DD-MON-YYYY') 
AND        SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID"))
        filter("TRAN_TAB"."PART_DATE1"=TO_DATE(:B1,'DD-MON-YYYY') 
AND       SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID"))
   11 - storage("TRAN_TAB2"."P_CD"(+)='XX' AND "TRAN_TAB2"."PART_DATE1"(+)=TO_DATE(:B1,'DD-MON-YYYY') ) filter("TRAN_TAB2"."P_CD"(+)='XX' AND
"TRAN_TAB2"."PART_DATE1"(+)=TO_DATE(:B1,'DD-MON-YYYY')) Am 31.01.2022 um 09:25 schrieb Laurentiu Oprea:
> Hello Pap,
>
> Starting from your initial question: "  how the access path to this 
> table can be made faster? ", the questions are: how fast can you make 
> the acces and by how much the query duration will improve.
>
> -> if you look into the second sql monitor report accessing the table 
> takes 46 seconds and represents 3.15% from the total activity. So do 
> you really need to improve the access to this table? My opinion this 
> is not the first thing you need to look into. Speaking of 
> partitioning, a range partition strategy will determine the optimizer 
> to use a partition range join-filter (based on a join filter create on 
> tran_tab2 table) ...but I`m not sure you can have range 
> partitioning for the data of the joined column .. This is something 
> you might need to figure it out.
>
> Taking a step back your biggest problem is the "HASH JOIN OUTER" from 
> the line 1, and mainly due to swapping on temp. In my opinion the 
> problem here is that the view is not merged thus the left side of the 
> HASH join will contain all columns from projection perspective, making 
> the hash table very big.
>
> So question number 1: why the view is not merged? Maybe it is related 
> to miss estimates (1M est vs 115M actual).
> If you manage to obtain a plan where instead of doing the "HASH JOIN 
> OUTER" from line 3 with the table TRAN_TAB2 do a hash join with the 
> result of the HASH join between TRAN_TAB2 and REF_TAB you should no 
> longer have the big temp swapping from line 1.
>
> Thank you,
> Laurentiu.
>
> În dum., 30 ian. 2022 la 20:30, Pap <oracle.developer35_at_gmail.com> a 
> scris:
>
>     Actually it's an ETL job query and I believe it's moving those
>     records to some downstream system based on the validation which
>     it's doing on that ref_tab table (which holds the masked values of
>     the COL1).
>
>     Alo i tried with px_join_filter hints on ref_tab, the table still
>     not showing bloom pruning in its plan. And yes we want to minimize
>     the execution time so as to ensure the data is reaching the
>     downstream system as fast as possible.
>
>     So it seems , if we stick with the requirement to move those 50-60
>     Million rows, the only better option would be to go by the full
>     scan+hash join path on this table and make use of the necessary
>     memory/PGA such that the tempspill can be minimized. Please
>     correct me if my understanding is wrong here. Or should we try the
>     covering index option as suggested by Sayan initially, maybe in
>     addition with hash partitioning etc?
>
>
>
>     On Sat, Jan 29, 2022 at 8:43 PM Andy Sayer <andysayer_at_gmail.com>
>     wrote:
>
>         Hi Pap,
>
>         What are you planning on doing with all these rows? To put it
>         bluntly, what’s the point in getting all these rows back as
>         fast as possible, no one is going to read that many that fast.
>         If it’s just a one off then you just need it to complete, right?
>
>         Thanks,
>         Andy
>
>         On Sat, 29 Jan 2022 at 14:22, Pap
>         <oracle.developer35_at_gmail.com> wrote:
>
>             Thank You Lothar.
>             So if I understand your point correctly, , do you mean to
>             say the HASH join plan with full scan on table REF_TAB
>             which I posted earlier, is the one it should go for?
>
>             But as a majority of time it's spent on hash outer join
>             and that to it spills to temp ~100GB+ , So as you
>             mentioned, i was trying to run it parallel-2 but still it
>             was consuming ~50GB+ temp and i killed it then. So I'm
>             still wondering, what will be the ideal way to cater this
>             key value lookup kind of table design situation,
>             considering the table ref_tab can hold ~5billion rows in
>             future i.e. ~4times the current volume(which is
>             ~1.5billion rows). Or like Lok was saying any hash
>             partitioning strategy will make the design or data fetch
>             better here?
>
>
>
>
>             On Sat, Jan 29, 2022 at 6:56 PM Lothar Flatz
>             <l.flatz_at_bluewin.ch> wrote:
>
>                 Hi Pap,
>
>                 I do discourage IOT.
>                 Andrew, my former boss at RWPG said me once that a
>                 index organized table which is bigger than half of the
>                 buffer cache can lead into serious trouble.
>                 Apart from that the nested loop join is a bad decision
>                 in that plan anyway.
>                 The estimate for line 4 is 166k rows, but the actual
>                 is 14M. That is an error factor > 80 times and that is
>                 serious.
>                 I am pretty sure if that estimate would be correct, we
>                 would not see an nested loop join.
>                 Correct would a hash join and you the query should be
>                 parallel too
>
>                 Thanks
>
>                 Lothar
>
>                 Am 28.01.2022 um 20:21 schrieb Pap:
>>                 Hello Listers, It's a 11.2.0.4 oracle database. We
>>                 have one table (say REF_TAB) which has four columns,
>>                 two of them(i.e. COL1, masked_COL1) are actually
>>                 holding business attributes and the other two are
>>                 created_date and created_by_user columns. The length
>>                 of those two business columns are varchar2(34 bytes)
>>                 and varchar2(36 bytes) respectively. And both of
>>                 these columns hold all unique/distinct values. Col1
>>                 is the one on which the primary key is defined. The
>>                 table currently has ~1.5billion rows in it and its
>>                 size is ~160GB. It is estimated to grow to hold
>>                 ~5-8billion rows.
>>
>>                 The table is always being queried on a filter/join on
>>                 COL1 and that too as an outer join and will fetch the
>>                 value of masked_col1. So in short the requirement is
>>                 to scan the full table data based on outer join on
>>                 column col1 like ( TRAN_TAB2.ANBR=REF_TAB.COL1 (+) ).
>>                 And below is a sample query sql monitor showing even
>>                 that table is getting access using primary key index
>>                 but still is consuming all the time(sql_plan_line_id
>>                 13 and 14 below). As sql monitor shows , Is those
>>                 1hr+ DB time for the 14 million times index
>>                 unique scan justified? And making it to go for full
>>                 using hints is chewing up ~100gb+ temp space too. I
>>                 can't not think of any partitioning strategy which
>>                 can help us here to get the table access faster, as
>>                 we have to look up whole column data here for
>>                 incoming value of COL1.
>>
>>                 So want to understand how the access path to this
>>                 table can be made faster? Will index organized tables
>>                 be suggested in such a scenario and will help us in
>>                 this kind of requirement? Or any other design
>>                 strategy required here?
>>
>>
>>                 Global Information
>>                 ------------------------------
>>                  Status     :  EXECUTING
>>                  Instance ID    :  4
>>                  SQL Execution ID     :  67108973
>>                  Execution Started    :  01/27/2022 07:43:01
>>                  First Refresh Time     :  01/27/2022 07:43:05
>>                  Last Refresh Time    :  01/27/2022 09:04:19
>>                  Duration     :  4879s
>>                  Fetch Calls    :  2782
>>
>>                 Global Stats
>>                 =================================================================================================================
>>                 | Elapsed |   Cpu   |    IO    | Application |
>>                 Concurrency | Cluster  | Fetch | Buffer | Read | Read
>>                  |  Cell   |
>>                 | Time(s) | Time(s) | Waits(s) |  Waits(s)   |
>>                  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs |
>>                 Bytes | Offload |
>>                 =================================================================================================================
>>                 |    5469 |    2065 |     2464 |    0.01 |      
>>                  0.00 |      940 |  2782 |   124M |  24M | 202GB |
>>                 9.09% |
>>                 =================================================================================================================
>>
>>                 ==============================================================================================================================================================================================================================
>>                 | Id    |                Operation              |    
>>                       Name     |  Rows   | Cost  |   Time    | Start
>>                  | Execs |   Rows   | Read  | Read  |  Cell   | Mem |
>>                 Activity |           Activity Detail    |
>>                 |       |               |      | (Estim) |       |
>>                 Active(s) | Active |       | (Actual) | Reqs  | Bytes
>>                 | Offload |     |   (%)    |             (# samples)    |
>>                 ==============================================================================================================================================================================================================================
>>                 |     0 | SELECT STATEMENT              |    |      
>>                   |       |      4875 |   +4 |     1 |      14M |    
>>                   |     |         |     |     1.11 | Cpu (53) |
>>                 |     1 |   NESTED LOOPS OUTER              |    |  
>>                  166K |    1M |      4875 |   +4 |     1 |      14M |
>>                       |     |         |     |     0.04 | Cpu (2)  |
>>                 |     2 |    VIEW               |      |    166K |
>>                  901K |      4875 |     +4 |     1 |      14M |      
>>                 |       |         |     |     0.40 | Cpu (19)     |
>>                 |     3 |     NESTED LOOPS OUTER              |    |
>>                    166K |  901K |      4875 |   +4 |     1 |      14M
>>                 |       |     |         |     |     0.04 | Cpu (2)  |
>>                 |     4 |      HASH JOIN              |    |    166K
>>                 |  402K |      4875 |   +4 |     1 |      14M |      
>>                 |     |         |  1M |     0.29 | Cpu (14) |
>>                 |     5 |       JOIN FILTER CREATE              |
>>                 :BF0000    |    1836 | 66073 |         1 |   +4 |    
>>                 1 |     1890 |       |     |         |     |        
>>                  |  |
>>                 |     6 |        TABLE ACCESS STORAGE FULL        |
>>                 STAGE_TAB            |    1836 | 66073 |   5 |     +0
>>                 |     1 |     1890 |     |       |         |     |
>>                 0.02 | gc cr multi block request (1)            |
>>                 |     7 |       JOIN FILTER USE               |
>>                 :BF0000      |     48M |  336K |      4875 |     +4 |
>>                     1 |      15M |       |       |         |     |  
>>                   0.13 | Cpu (6)      |
>>                 |     8 |        PARTITION RANGE SINGLE           |  
>>                      |     48M |  336K |      4875 |     +4 |     1 |
>>                      15M | |       |         |     |     0.04 | Cpu
>>                 (2)      |
>>                 |     9 |         TABLE ACCESS STORAGE FULL       |
>>                 TRAN_TAB           |     48M |  336K |  4878 |     +1
>>                 |     1 |      15M | 11748 |  11GB |  92.40% |  7M |
>>                 0.25 | Cpu (11)             |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | cell smart table scan (1)
>>                      |
>>                 |    10 |      PARTITION RANGE SINGLE             |  
>>                        |       1 |     4 |  4875 |     +4 |   14M |  
>>                     2M |       |       |         |     | 0.53 | Cpu
>>                 (25)             |
>>                 |    11 |       TABLE ACCESS BY LOCAL INDEX ROWID |
>>                 TRAN_TAB2          |       1 |     4 |  4875 |     +4
>>                 |   14M |       2M |  433K |   3GB |         |     |
>>                 2.52 | gc cr grant 2-way (26)             |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | Cpu (57)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gcs drm freeze in enter
>>                 server mode (2)  |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | cell single block physical
>>                 read (35)     |
>>                 |    12 |        INDEX RANGE SCAN               |
>>                 TRAN_TAB2_IX1      |       1 |     3 |      4875 |  
>>                   +4 |   14M |       2M |  271K |   2GB |         |  
>>                   |     2.40 | gc cr grant 2-way (18)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | Cpu (63)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | cell single block physical
>>                 read (33)     |
>>                 |    13 |    TABLE ACCESS BY INDEX ROWID          |
>>                 REF_TAB      |       1 |     3 |      4877 |     +2 |
>>                   14M |      13M |   11M |  88GB |         |     |  
>>                  34.01 | gc cr block 2-way (1)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr disk read (6)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | Cpu (397)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | cell single block physical
>>                 read (1214)   |
>>                 | -> 14 |     INDEX UNIQUE SCAN                 |
>>                 REF_TAB_PK       |       1 |     2 |      4876 |    
>>                 +4 |   14M |      13M |   12M |  89GB |         |    
>>                 |    57.09 | gc cr block 2-way (154)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr block 3-way (14)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr block busy (520)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr disk read (30)     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr failure (1)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc cr grant 2-way (106)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc current block 2-way (4)
>>                     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc current grant 2-way (3)
>>                     |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gc remaster (2)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | Cpu (754)      |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | gcs drm freeze in enter
>>                 server mode (6)  |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | latch: object queue header
>>                 operation (1) |
>>                 |       |               |      |         |       |  
>>                         |        |       |          |       |       |
>>                         |     |          | cell single block physical
>>                 read (1121)   |
>>                 ==============================================================================================================================================================================================================================
>>                 Predicate Information (identified by operation id):
>>                 ---------------------------------------------------
>>                    4 - access("TRAN_TAB"."SBID"="SFID")
>>                    6 - storage(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
>>                 filter(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
>>                    9 - storage(("TRAN_TAB"."PART_DATE1"=:B1 AND
>>                 SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
>>                  filter(("TRAN_TAB"."PART_DATE1"=:B1 AND
>>                 SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
>>                   11 - filter("TRAN_TAB2"."PART_DATE1"=:B1)
>>                   12 - access("TRAN_TAB"."TX_ID"="TRAN_TAB2"."TX_ID"
>>                 AND "TRAN_TAB2"."P_CD"='XX')
>>                   14 - access("TRAN_TAB2"."ANBR"="REF_TAB"."COL1")
>>
>>
>>
>


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 31 2022 - 10:15:47 CET

Original text of this message