Re: Execution path having full scan in a nested loop

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Tue, 12 Oct 2021 10:28:19 +0100
Message-ID: <CAGtsp8=QMfBWNEtKtC64cBUriJCjKhE7KkHBUb1NKnBuenNAqA_at_mail.gmail.com>



>
> So I want to understand if we can modify this query to make it go for one
time full table scan, maybe with a hash join kind of operation, so that this can complete in a faster time?
>

I knew that if I answered the question you'd ask you'd only ask a load more. ;)

I suggested the hints because that would demonstrate whether or not the transformation was possible (it almost always is in 19c) and purely cost-based (which is not always the case - sometimes the optimizer will pick a higher cost path if a lower cost path is based in guesswork that may introduce a lot of variability at run-time).

If the hinted code does what you expect and the cost of the query is HIGHER as a result this almost always means that there's a statistics issue involved. (If the cost is LOWER this tends to mean that Oracle has a rule that has rejected the path.) This means you may be able to find a suitable way to fix the statistics to get the plan you want without hinting.

I suggested the NO_MERGE hint because you may find that hinted with only UNNEST a change in the data results in the optimizer deciding to do complex view merging - which means it will join all the tables and aggregate late, and that could be a disaster. (Here's a link to a very old blog note of mine demonstrating the concept:
https://jonathanlewis.wordpress.com/2007/03/08/transformation-and-optimisation/ )

You started with a comment about an index on RID, but that's irrelevant, it could help the join from MASTER to DETAIL, but that doesn't help with the correlation between the subquery and PBRF. This becomes cleared when Oracle rewrites the NOT IN to NOT EXISTS and you can see the necessary correlation in the FILTER In the original plan:

   3 - filter( NOT EXISTS

        (SELECT 0 FROM "USER1"."DETAIL" "DTL","USER1"."MASTER" "REQ"
         WHERE "REQ"."RID"="DTL"."RID" AND "REQ"."RTYP"=:B4
         AND ("REQ"."RSTS"='YY' OR "REQ"."RSTS"='XX')
         AND "DTL"."FL_TYP"='DP'
         AND  LNNVL("DTL"."FL_NM"<>:B1) AND
LNNVL(NVL(:B2,'-')<>NVL("DTL"."SID",'-'))))

and compare it with the original NOT IN predicate

AND     (rf.fattr3, NVL (rf.fattr, '-')) NOT IN(
                SELECT DISTINCT   dtl.FL_NM, NVL (dtl.SID, '-')

The connection between RF and DTL is based on fattr3/fl_nm and fattr/sid - and because the new plan report a "null aware" antijoin, you can see that the optimizer has embedded the correlation predicates inside then lnnvl() function. There might be a way to add some NOT NULL declarations and function-based indexes to make this more efficient, but don't ask me to guess about things that might or might not be valid for your data. It's also possible that with the right indexing and stats the subquery could drive a nested loop from MASTER to DETAIL, but that would at the least require an index on (rtyp, rsts) and maybe some statistics that made it look as if that matched only a few rows.

Another benefit of seeing the results of the unnested subquery is that we can see why the optimizer didn't choose it. The cost of the single SORT JOIN for the subquery is over 8,000; the cost of the tablescan of DETAIL is 607, which means Oracle could do the tablescan about 13 times before it should choose to unnest. However its estimate of rows from the driving table is 2, (although the actual is 9.626). If you can get the optimizer to get a better estimate of cardinality of the driving table it would unnest automatically. It's possible that creating a column group on (F_GRP, rf.F_CATG) would be sufficient if the number of combinations is significantly less than the product of the individual num_distinct.

Regards
Jonathan Lewis

On Tue, 12 Oct 2021 at 07:05, Lok P <loknath.73_at_gmail.com> wrote:

> Thank you Jonathan.
>
> Your suggested hints(even with only UNNEST hints) to the subquery making
> the query finish in ~2seconds as opposed to ~143 seconds before and so it
> seems it's possible as per optimizer transformations is concerned. So I
> wanted to understand why optimizers by default are not doing this
> because of any restriction or we should tweak the query someway Or any
> issue with stats itself?
>
> Initially I was thinking if we are hitting the below restriction as it
> is in the blog due to which the optimizer is not able to go for a hash
> join, so tried putting that subquery in a case statement , but it
> didn't work.
>
>
> https://ctandrewsayer.wordpress.com/2018/06/06/conditional-outer-joins-forcing-nested-loops/
>
>
> Global Information
> ------------------------------
> Status : DONE (ALL ROWS)
> Instance ID : 1
> SQL ID : 0v7fsttsqubh5
> SQL Execution ID : 16777216
> Execution Started : 10/12/2021 01:30:02
> First Refresh Time : 10/12/2021 01:30:02
> Last Refresh Time : 10/12/2021 01:30:04
> Duration : 2s
> Module/Action : SQL*Plus/-
> Program : sqlplus.exe
> Fetch Calls : 2
>
>
> Global Stats
>
> ======================================================================================================
> | Elapsed | Cpu | IO | Application | Concurrency | Fetch |
> Buffer | Read | Read | Cell |
> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets
> | Reqs | Bytes | Offload |
>
> ======================================================================================================
> | 1.02 | 0.98 | 0.04 | 0.00 | 0.00 | 2 |
> 31621 | 228 | 206MB | 81.41% |
>
> ======================================================================================================
>
> SQL Plan Monitoring Details (Plan Hash Value=3386919479)
>
> =============================================================================================================================================================================================
> | Id | Operation | Name |
> Rows | Cost | Time | Start | Execs | Rows | Read | Read |
> Cell | Mem | Activity | Activity Detail |
> | | | |
> (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
> Offload | (Max) | (%) | (# samples) |
>
> =============================================================================================================================================================================================
> | 0 | SELECT STATEMENT | |
> | | 1 | +2 | 1 | 421 | | |
> | | | |
> | 1 | WINDOW BUFFER | |
> 1 | 8150 | 1 | +2 | 1 | 421 | | |
> | 32768 | | |
> | 2 | HASH GROUP BY | |
> 1 | 8150 | 1 | +2 | 1 | 421 | | |
> | 1M | | |
> | 3 | FILTER | |
> | | 1 | +2 | 1 | 6985 | | |
> | | | |
> | 4 | MERGE JOIN ANTI NA | |
> 1 | 8149 | 1 | +2 | 1 | 6985 | | |
> | | | |
> | 5 | SORT JOIN | |
> 2 | 120 | 1 | +2 | 1 | 9626 | | |
> | 886K | | |
> | 6 | TABLE ACCESS BY INDEX ROWID | PBRF |
> 2 | 119 | 1 | +2 | 1 | 9626 | | |
> | | | |
> | 7 | INDEX SKIP SCAN | PBRF_IX1 |
> 4 | 118 | 1 | +2 | 1 | 9626 | | |
> | | | |
> | 8 | SORT UNIQUE | |
> 868K | 8029 | 2 | +1 | 9627 | 2641 | | |
> | | 100.00 | Cpu (1) |
> | 9 | VIEW | VW_NSO_1 |
> 868K | 635 | 1 | +2 | 1 | 1M | | |
> | | | |
> | 10 | HASH JOIN | |
> 868K | 635 | 1 | +2 | 1 | 1M | | |
> | 2M | | |
> | 11 | JOIN FILTER CREATE | :BF0000 |
> 19381 | 18 | 1 | +2 | 1 | 18647 | | |
> | | | |
> | 12 | TABLE ACCESS STORAGE FULL | MASTER |
> 19381 | 18 | 1 | +2 | 1 | 18647 | | |
> | | | |
> | 13 | JOIN FILTER USE | :BF0000 |
> 1M | 607 | 1 | +2 | 1 | 1M | | |
> | | | |
> | 14 | TABLE ACCESS STORAGE FULL | DETAIL |
> 1M | 607 | 1 | +2 | 1 | 1M | 228 | 206MB | 81.41%
> | 7M | | |
>
> =============================================================================================================================================================================================
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 3 - filter(TO_DATE(:B6,'MM/DD/YYYY')>=TO_DATE(:B5,'MM/DD/YYYY'))
> 7 - access("RF"."W_DATE">=TO_DATE(:B5,'MM/DD/YYYY') AND
> "RF"."F_GRP"=:B1 AND "RF"."W_DATE"<=TO_DATE(:B6,'MM/DD/YYYY'))
> filter("RF"."F_GRP"=:B1 AND ("RF"."F_CATG"=:B2 OR
> "RF"."F_CATG"=:B3))
> 8 - access(INTERNAL_FUNCTION("RF"."fattr3")=INTERNAL_FUNCTION("FL_NM")
> AND INTERNAL_FUNCTION("NVL(DTL.SID,'-')")=NVL("RF"."fattr2",'-'))
> filter(INTERNAL_FUNCTION("NVL(DTL.SID,'-')")=NVL("RF"."fattr2",'-') AND
> INTERNAL_FUNCTION("RF"."fattr3")=INTERNAL_FUNCTION("FL_NM"))
> 10 - access("REQ"."RID"="DTL"."RID")
> 12 - storage("REQ"."RTYP"=:B4 AND ("REQ"."RSTS"='FP' OR
> "REQ"."RSTS"='IP'))
> filter("REQ"."RTYP"=:B4 AND ("REQ"."RSTS"='FP' OR
> "REQ"."RSTS"='IP'))
> 14 - storage("DTL"."FL_TYP"='DP' AND
> SYS_OP_BLOOM_FILTER(:BF0000,"DTL"."RID"))
> filter("DTL"."FL_TYP"='DP' AND
> SYS_OP_BLOOM_FILTER(:BF0000,"DTL"."RID"))
>
>
>
>
>
>
>
>
> On Tue, Oct 12, 2021 at 2:31 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> The shape of the plan you're describing is one where the subquery is
>> unnested to a non-mergeable aggregate view.
>> To see if this is legal as far as the optimizer is concerned you could
>> add the hints /*+ unnest no_merge */ to the subquery.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Mon, 11 Oct 2021 at 20:50, Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> Hello Listers, We have one database on version 11.2.0.4 of oracle. And
>>> below query is spending quite a lot of time while scanning table 'DETAIL'
>>> in nested loop path i.e. step-9 below. This table does have an index with
>>> the leading column as RID(which is joined column) but still it's going for
>>> a 'TABLE ACCESS STORAGE FULL FIRST ROWS' within a nested loop. Not sure if
>>> it's just because we are reading a lot of rows from that table or if we are
>>> hitting any optimizer restriction. So I want to understand if we can modify
>>> this query to make it go for one time full table scan, maybe with a hash
>>> join kind of operation, so that this can complete in a faster time?
>>>
>>> Table DETAIL having ~1.7million rows in it. And column RID having 31K
>>> distinct values in it. And table MASTER having ~34k in it.
>>>
>>> SELECT TRIM (rf.fattr3) ,TRIM (rf.fattr) , rf.W_DATE ,rf.CODE ,NVL (SUM
>>> (rf.txn_cnt), 0) AS cnt,NVL (SUM (DECODE (TTYP, 'S', 1, -1) * rf.amt),0)
>>> amt,count(*) over () count1
>>> FROM PBRF rf
>>> WHERE rf.F_GRP = :b1
>>> AND rf.F_CATG IN ( :b2, :b3)
>>> AND (rf.fattr3, NVL (rf.fattr, '-')) NOT IN
>>> (SELECT DISTINCT dtl.FL_NM, NVL (dtl.SID, '-')
>>> FROM MASTER req, DETAIL dtl
>>> WHERE req.RID = dtl.RID
>>> AND req.RSTS IN ('XX', 'YY')
>>> AND req.RTYP = :b4
>>> AND dtl.FL_TYP = 'DP')
>>> AND rf.W_DATE BETWEEN TO_DATE ( :b5,'MM/DD/YYYY') AND TO_DATE (
>>> :b6,'MM/DD/YYYY')
>>> GROUP BY rf.CODE,TRIM (rf.fattr),TRIM (rf.fattr3), rf.W_DATE;
>>>
>>> Global Information
>>> ------------------------------
>>> Status : DONE (ALL ROWS)
>>> Instance ID : 1
>>> SQL ID : 7yc5x3uzx7vzm
>>> SQL Execution ID : 16777216
>>> Execution Started : 10/11/2021 15:18:40
>>> First Refresh Time : 10/11/2021 15:18:40
>>> Last Refresh Time : 10/11/2021 15:21:03
>>> Duration : 143s
>>> Module/Action : SQL*Plus/-
>>> Program : sqlplus.exe
>>> Fetch Calls : 2
>>>
>>> Global Stats
>>>
>>> ===================================================================================================
>>> | Elapsed | Cpu | IO | Application | Other | Fetch | Buffer
>>> | Read | Read | Cell |
>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets
>>> | Reqs | Bytes | Offload |
>>>
>>> ===================================================================================================
>>> | 144 | 131 | 11 | 1.13 | 0.18 | 2 | 38M
>>> | 348K | 293GB | 80.58% |
>>>
>>> ===================================================================================================
>>>
>>> SQL Plan Monitoring Details (Plan Hash Value=3512223473)
>>>
>>> ==========================================================================================================================================================================================================================
>>> | Id | Operation | Name
>>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read
>>> | Cell | Mem | Activity | Activity Detail |
>>> | | |
>>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes
>>> | Offload | (Max) | (%) | (# samples) |
>>>
>>> ==========================================================================================================================================================================================================================
>>> | 0 | SELECT STATEMENT |
>>> | | | 1 | +143 | 1 | 421 | |
>>> | | | | |
>>> | 1 | WINDOW BUFFER |
>>> | 1 | 125 | 1 | +143 | 1 | 421 | |
>>> | | 32768 | | |
>>> | 2 | HASH GROUP BY |
>>> | 1 | 125 | 142 | +2 | 1 | 421 | |
>>> | | 1M | | |
>>> | 3 | FILTER |
>>> | | | 142 | +2 | 1 | 6985 | |
>>> | | | | |
>>> | 4 | FILTER |
>>> | | | 142 | +2 | 1 | 9626 | |
>>> | | | | |
>>> | 5 | TABLE ACCESS BY INDEX ROWID | PBRF
>>> | 2 | 119 | 142 | +2 | 1 | 9626 | |
>>> | | | | |
>>> | 6 | INDEX SKIP SCAN | PBRF_IX1
>>> | 4 | 118 | 142 | +2 | 1 | 9626 | 1 |
>>> 8192 | | | | |
>>> | 7 | NESTED LOOPS |
>>> | 3 | 5 | 142 | +2 | 1712 | 532 | |
>>> | | | | |
>>> | 8 | NESTED LOOPS |
>>> | 3 | 5 | 142 | +2 | 1712 | 1417 | |
>>> | | | | |
>>> | 9 | TABLE ACCESS STORAGE FULL FIRST ROWS | DETAIL
>>> | 6436 | 2 | 144 | +1 | 1712 | 1417 | 348K |
>>> 293GB | 80.58% | 17M | 100.00 | enq: KO - fast object checkpoint (2) |
>>> | | |
>>> | | | | | | | |
>>> | | | | Cpu (124) |
>>> | | |
>>> | | | | | | | |
>>> | | | | reliable message (4) |
>>> | | |
>>> | | | | | | | |
>>> | | | | cell smart table scan (14) |
>>> | 10 | INDEX UNIQUE SCAN | MASTER_PK
>>> | 1 | | 142 | +2 | 1417 | 1417 | |
>>> | | | | |
>>> | 11 | TABLE ACCESS BY INDEX ROWID | MASTER
>>> | 1 | 1 | 142 | +2 | 1417 | 532 | |
>>> | | | | |
>>> ==========================================================================================================================================================================================================================
>>>
>>>
>>>
>>> Predicate Information (identified by operation id):
>>> ---------------------------------------------------
>>> 3 - filter( NOT EXISTS (SELECT 0 FROM "USER1"."DETAIL"
>>> "DTL","USER1"."MASTER" "REQ" WHERE "REQ"."RID"="DTL"."RID" AND
>>> "REQ"."RTYP"=:B4 AND
>>> ("REQ"."RSTS"='YY' OR "REQ"."RSTS"='XX') AND
>>> "DTL"."FL_TYP"='DP' AND LNNVL("DTL"."FL_NM"<>:B1) AND
>>> LNNVL(NVL(:B2,'-')<>NVL("DTL"."SID",'-'))))
>>> 4 - filter(TO_DATE(:B6,'MM/DD/YYYY')>=TO_DATE(:B5,'MM/DD/YYYY'))
>>> 6 - access("RF"."W_DATE">=TO_DATE(:B5,'MM/DD/YYYY') AND
>>> "RF"."F_GRP"=:B1 AND "RF"."W_DATE"<=TO_DATE(:B6,'MM/DD/YYYY'))
>>> filter("RF"."F_GRP"=:B1 AND ("RF"."F_CATG"=:B2 OR
>>> "RF"."F_CATG"=:B3))
>>> 9 - storage("DTL"."FL_TYP"='DP')
>>> filter("DTL"."FL_TYP"='DP' AND LNNVL("DTL"."FL_NM"<>:B1) AND
>>> LNNVL(NVL(:B2,'-')<>NVL("DTL"."SID",'-')))
>>> 10 - access("REQ"."RID"="DTL"."RID")
>>> 11 - filter("REQ"."RTYP"=:B4 AND ("REQ"."RSTS"='YY' OR
>>> "REQ"."RSTS"='XX'))
>>>
>>> Statistics
>>> ----------------------------------------------------------
>>> 1 recursive calls
>>> 0 db block gets
>>> 38469369 consistent gets
>>> 38459185 physical reads
>>> 52 redo size
>>> 14543 bytes sent via SQL*Net to client
>>> 472 bytes received via SQL*Net from client
>>> 2 SQL*Net roundtrips to/from client
>>> 1 sorts (memory)
>>> 0 sorts (disk)
>>> 421 rows processed
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 12 2021 - 11:28:19 CEST

Original text of this message