Re: Need design suggestion , whether index organized table is good choice here

From: Pap <oracle.developer35_at_gmail.com>
Date: Mon, 31 Jan 2022 00:00:02 +0530
Message-ID: <CAEjw_fiP36igS4tpeMq4aqE9P5xP-CyyqZsipVfGYN-Ce65YAw_at_mail.gmail.com>



Actually it's an ETL job query and I believe it's moving those records to some downstream system based on the validation which it's doing on that ref_tab table (which holds the masked values of the COL1).

Alo i tried with px_join_filter hints on ref_tab, the table still not showing bloom pruning in its plan. And yes we want to minimize the execution time so as to ensure the data is reaching the downstream system as fast as possible.

So it seems , if we stick with the requirement to move those 50-60 Million rows, the only better option would be to go by the full scan+hash join path on this table and make use of the necessary memory/PGA such that the tempspill can be minimized. Please correct me if my understanding is wrong here. Or should we try the covering index option as suggested by Sayan initially, maybe in addition with hash partitioning etc?

On Sat, Jan 29, 2022 at 8:43 PM Andy Sayer <andysayer_at_gmail.com> wrote:

> Hi Pap,
>
> What are you planning on doing with all these rows? To put it bluntly,
> what’s the point in getting all these rows back as fast as possible, no one
> is going to read that many that fast. If it’s just a one off then you just
> need it to complete, right?
>
> Thanks,
> Andy
>
> On Sat, 29 Jan 2022 at 14:22, Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank You Lothar.
>> So if I understand your point correctly, , do you mean to say the HASH
>> join plan with full scan on table REF_TAB which I posted earlier, is the
>> one it should go for?
>>
>> But as a majority of time it's spent on hash outer join and that to it
>> spills to temp ~100GB+ , So as you mentioned, i was trying to run it
>> parallel-2 but still it was consuming ~50GB+ temp and i killed it then. So
>> I'm still wondering, what will be the ideal way to cater this key value
>> lookup kind of table design situation, considering the table ref_tab can
>> hold ~5billion rows in future i.e. ~4times the current volume(which is
>> ~1.5billion rows). Or like Lok was saying any hash partitioning strategy
>> will make the design or data fetch better here?
>>
>>
>>
>>
>> On Sat, Jan 29, 2022 at 6:56 PM Lothar Flatz <l.flatz_at_bluewin.ch> wrote:
>>
>>> Hi Pap,
>>>
>>> I do discourage IOT.
>>> Andrew, my former boss at RWPG said me once that a index organized table
>>> which is bigger than half of the buffer cache can lead into serious
>>> trouble.
>>> Apart from that the nested loop join is a bad decision in that plan
>>> anyway.
>>> The estimate for line 4 is 166k rows, but the actual is 14M. That is an
>>> error factor > 80 times and that is serious.
>>> I am pretty sure if that estimate would be correct, we would not see an
>>> nested loop join.
>>> Correct would a hash join and you the query should be parallel too
>>>
>>> Thanks
>>>
>>> Lothar
>>>
>>> Am 28.01.2022 um 20:21 schrieb Pap:
>>>
>>> Hello Listers, It's a 11.2.0.4 oracle database. We have one table (say
>>> REF_TAB) which has four columns, two of them(i.e. COL1, masked_COL1) are
>>> actually holding business attributes and the other two are created_date and
>>> created_by_user columns. The length of those two business columns are
>>> varchar2(34 bytes) and varchar2(36 bytes) respectively. And both of these
>>> columns hold all unique/distinct values. Col1 is the one on which the
>>> primary key is defined. The table currently has ~1.5billion rows in it and
>>> its size is ~160GB. It is estimated to grow to hold ~5-8billion rows.
>>>
>>> The table is always being queried on a filter/join on COL1 and that too
>>> as an outer join and will fetch the value of masked_col1. So in short the
>>> requirement is to scan the full table data based on outer join on column
>>> col1 like ( TRAN_TAB2.ANBR=REF_TAB.COL1 (+) ). And below is a sample
>>> query sql monitor showing even that table is getting access using primary
>>> key index but still is consuming all the time(sql_plan_line_id 13 and 14
>>> below). As sql monitor shows , Is those 1hr+ DB time for the 14
>>> million times index unique scan justified? And making it to go for full
>>> using hints is chewing up ~100gb+ temp space too. I can't not think of any
>>> partitioning strategy which can help us here to get the table access
>>> faster, as we have to look up whole column data here for incoming value of
>>> COL1.
>>>
>>> So want to understand how the access path to this table can be made
>>> faster? Will index organized tables be suggested in such a scenario and
>>> will help us in this kind of requirement? Or any other design strategy
>>> required here?
>>>
>>>
>>> Global Information
>>> ------------------------------
>>> Status : EXECUTING
>>>
>>> Instance ID : 4
>>>
>>> SQL Execution ID : 67108973
>>>
>>> Execution Started : 01/27/2022 07:43:01
>>>
>>> First Refresh Time : 01/27/2022 07:43:05
>>>
>>> Last Refresh Time : 01/27/2022 09:04:19
>>>
>>> Duration : 4879s
>>>
>>> Fetch Calls : 2782
>>>
>>>
>>> Global Stats
>>>
>>> =================================================================================================================
>>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster |
>>> Fetch | Buffer | Read | Read | Cell |
>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s) |
>>> Calls | Gets | Reqs | Bytes | Offload |
>>>
>>> =================================================================================================================
>>> | 5469 | 2065 | 2464 | 0.01 | 0.00 | 940 |
>>> 2782 | 124M | 24M | 202GB | 9.09% |
>>>
>>> =================================================================================================================
>>>
>>>
>>> ==============================================================================================================================================================================================================================
>>> | Id | Operation | Name
>>> | Rows | Cost | Time | Start | Execs | Rows | Read |
>>> Read | Cell | Mem | Activity | Activity Detail
>>> |
>>> | | |
>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs |
>>> Bytes | Offload | | (%) | (# samples)
>>> |
>>>
>>> ==============================================================================================================================================================================================================================
>>> | 0 | SELECT STATEMENT |
>>> | | | 4875 | +4 | 1 | 14M | |
>>> | | | 1.11 | Cpu (53) |
>>> | 1 | NESTED LOOPS OUTER |
>>> | 166K | 1M | 4875 | +4 | 1 | 14M | |
>>> | | | 0.04 | Cpu (2) |
>>> | 2 | VIEW |
>>> | 166K | 901K | 4875 | +4 | 1 | 14M | |
>>> | | | 0.40 | Cpu (19) |
>>> | 3 | NESTED LOOPS OUTER |
>>> | 166K | 901K | 4875 | +4 | 1 | 14M | |
>>> | | | 0.04 | Cpu (2) |
>>> | 4 | HASH JOIN |
>>> | 166K | 402K | 4875 | +4 | 1 | 14M | |
>>> | | 1M | 0.29 | Cpu (14) |
>>> | 5 | JOIN FILTER CREATE | :BF0000
>>> | 1836 | 66073 | 1 | +4 | 1 | 1890 | |
>>> | | | | |
>>> | 6 | TABLE ACCESS STORAGE FULL | STAGE_TAB
>>> | 1836 | 66073 | 5 | +0 | 1 | 1890 | |
>>> | | | 0.02 | gc cr multi block request (1) |
>>> | 7 | JOIN FILTER USE | :BF0000
>>> | 48M | 336K | 4875 | +4 | 1 | 15M | |
>>> | | | 0.13 | Cpu (6) |
>>> | 8 | PARTITION RANGE SINGLE |
>>> | 48M | 336K | 4875 | +4 | 1 | 15M | |
>>> | | | 0.04 | Cpu (2) |
>>> | 9 | TABLE ACCESS STORAGE FULL | TRAN_TAB
>>> | 48M | 336K | 4878 | +1 | 1 | 15M | 11748 |
>>> 11GB | 92.40% | 7M | 0.25 | Cpu (11)
>>> |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell smart table scan (1) |
>>> | 10 | PARTITION RANGE SINGLE |
>>> | 1 | 4 | 4875 | +4 | 14M | 2M | |
>>> | | | 0.53 | Cpu (25) |
>>> | 11 | TABLE ACCESS BY LOCAL INDEX ROWID | TRAN_TAB2
>>> | 1 | 4 | 4875 | +4 | 14M | 2M | 433K |
>>> 3GB | | | 2.52 | gc cr grant 2-way (26) |
>>> | | |
>>> | | | | | | | |
>>> | | | | Cpu (57) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gcs drm freeze in enter server mode (2) |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell single block physical read (35) |
>>> | 12 | INDEX RANGE SCAN | TRAN_TAB2_IX1
>>> | 1 | 3 | 4875 | +4 | 14M | 2M | 271K |
>>> 2GB | | | 2.40 | gc cr grant 2-way (18) |
>>> | | |
>>> | | | | | | | |
>>> | | | | Cpu (63) |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell single block physical read (33) |
>>> | 13 | TABLE ACCESS BY INDEX ROWID | REF_TAB
>>> | 1 | 3 | 4877 | +2 | 14M | 13M | 11M |
>>> 88GB | | | 34.01 | gc cr block 2-way (1)
>>> |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr disk read (6) |
>>> | | |
>>> | | | | | | | |
>>> | | | | Cpu (397) |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell single block physical read (1214) |
>>> | -> 14 | INDEX UNIQUE SCAN | REF_TAB_PK
>>> | 1 | 2 | 4876 | +4 | 14M | 13M | 12M |
>>> 89GB | | | 57.09 | gc cr block 2-way (154)
>>> |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr block 3-way (14) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr block busy (520) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr disk read (30) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr failure (1) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc cr grant 2-way (106) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc current block 2-way (4) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc current grant 2-way (3) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gc remaster (2) |
>>> | | |
>>> | | | | | | | |
>>> | | | | Cpu (754) |
>>> | | |
>>> | | | | | | | |
>>> | | | | gcs drm freeze in enter server mode (6) |
>>> | | |
>>> | | | | | | | |
>>> | | | | latch: object queue header operation (1) |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell single block physical read (1121) |
>>>
>>> ==============================================================================================================================================================================================================================
>>> Predicate Information (identified by operation id):
>>> ---------------------------------------------------
>>> 4 - access("TRAN_TAB"."SBID"="SFID")
>>> 6 - storage(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
>>> filter(("PART_DATE"=:B1 AND "ASP_NM"=:B2))
>>> 9 - storage(("TRAN_TAB"."PART_DATE1"=:B1 AND
>>> SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
>>> filter(("TRAN_TAB"."PART_DATE1"=:B1 AND
>>> SYS_OP_BLOOM_FILTER(:BF0000,"TRAN_TAB"."SBID")))
>>> 11 - filter("TRAN_TAB2"."PART_DATE1"=:B1)
>>> 12 - access("TRAN_TAB"."TX_ID"="TRAN_TAB2"."TX_ID" AND
>>> "TRAN_TAB2"."P_CD"='XX')
>>> 14 - access("TRAN_TAB2"."ANBR"="REF_TAB"."COL1")
>>>
>>>
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Jan 30 2022 - 19:30:02 CET

Original text of this message