Re: [External] Remote database table cardinality estimation

From: Laurentiu Oprea <laurentiu.oprea06_at_gmail.com>
Date: Wed, 2 Dec 2020 13:19:47 +0200
Message-ID: <CA+riqSUCdXE-Bdu0SF-2QhYnZxRr-BsN_tXWNt6ihFypuV0zOA_at_mail.gmail.com>



Hello Lok,

I was not able to find any reference that a function based index is not used for a remote table part of join so I tried to replicate the blog page referenced by Jonathan.

Indeed by default on my pdb 19c test db behaved the same, but with proper hinting the index was used: /*+ GATHER_PLAN_STATISTICS index(t2 t2_f1) use_nl(t1 t2)*/

This concludes that it is not a limitation but more of a plan selection issue.

I would say the best approaches for your situation is : 1. maybe discuss with the business and possibly remove the upper function, it might be the case that is not needed
2. if 1 not possible then create a function based index 3. if 1 and 2 not possible then introduce USE_HASH(CT) hint either in the code or using a sql patch

Hope this helps.

În mar., 1 dec. 2020 la 16:46, Lok P <loknath.73_at_gmail.com> a scris:

> The DISTINCT M_ID subquery estimation is deviating from ~2 estimated rows
> VS ~15 actual rows, So I am thinking if by some way we can make the
> estimation of that HASH JOIN at Line-25 accurate will help us here or the
> only option would be to consider adding the remote table REMOTE_TAB alias
> "CT" as HASH JOIN hint i.e. (USE_HASH(CT)) ?
>
> On Tue, Dec 1, 2020 at 2:24 PM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Thank You very much.
>>
>> As you mentioned the function based index is not going to help(and also i
>> verified we don't have any function based index rather a simple btree index
>> on column NM_TXT), so basically the current "NESTED LOOP OUTER" path for
>> scanning REMOTE_TAB is not going to help us here. And driving_site hint may
>> not help as we have a lot of big tables that get scanned/fetched are part
>> of the local database here in this query. So what would be the possible way
>> to fix this query here?
>>
>> I am not able to fully confirm if its the estimation of the REMOTE_TAB is causing
>> the issue OR the HASH join at plan_line_id - 25 is the cause of the bad
>> path. As the remote_tab is holding ~400k rows and all of them have unique
>> NM_TXT and the estimation shows as ~1 (which is the minimum possible values
>> Oracle shows in the plan). But anyway, as i read your blogs, it says the
>> optimizer won't be using column histogram info which going through DB
>> link, (and i belive that means optimizer can use all other stats like
>> num_rows, num_distinct, num_nulls etc from the remote database for
>> evaluating execution plan) and in our case that column NM_TXT is all unique
>> , so i believe that issue should not affect us here. Please correct me if
>> I am wrong.
>>
>> Below is the second part of the UNION ALL from which the plan-line ~42 is
>> coming up. I have removed the exact object names and just kept the aliases, in
>> the below query. it uses a bind variable :b2 which is coming as a string
>> from Java and its coming as full string like '9999999' and no wild search
>> is happening here (say like '%999999%'), so not sure why the 'LIKE'
>> operator is used here, as it looks to me, it can be replaced by an equal
>> operator. But then I was trying to see the estimation of that subquery
>> part which is performing distinct M_ID as that is one of the inputs to
>> the HASH join of plan line id-25.
>>
>> I see with the Like operator with bind variable, the estimation goes way
>> higher, as i believe optimizer is doing it based on wild character search.
>> But with an equal operator it's coming as '1' and with literal the
>> estimation is coming as '2'. We do use bind variable here, but I think
>> optimizer is peeking at it at the run time and deciding the estimation
>> as "2". But I verified the actual count from this subquery is coming
>> ~15. I am not sure how I can i fix this one , and if it's really going
>> to put the table REMOTE_TAB in "HASH JOIn OUTER" irrespective of "NESTED
>> LOOP OUTER"?
>>
>>
>> SELECT ...
>>
>> FROM u, gc, ct, rol, mc, c, m
>>
>> WHERE gc.CT_ID = u.CT_ID(+)
>>
>> AND gc.AB_ID = u.AB_ID(+)
>>
>> AND u.NM_TXT = UPPER (ct.NM_TXT(+))
>>
>> AND u.RID = rol.RID(+)
>>
>> AND gc.M_CID = 'M'
>>
>> AND gc.C_TYP = 'X'
>>
>> AND gc.AB_ID = mc.AB_ID
>>
>> AND gc.CT_ID = mc.CT_ID
>>
>> AND mc.M_ID = m.M_ID
>>
>> AND m.CID = c.CID
>>
>> AND mc.M_ID IN
>>
>> (SELECT DISTINCT M_ID
>>
>> FROM M, C
>>
>> WHERE M.AB_ID = C.AB_ID
>>
>> AND M.CID = C.CID
>>
>> AND c.CID LIKE ( :B2)
>>
>> )
>>
>> ORDER BY 1;
>>
>>
>> SQL> SELECT DISTINCT M_ID
>>
>> 2 FROM M, C
>>
>> 3 WHERE M.AB_ID = C.AB_ID
>>
>> 4 AND M.CID = C.CID
>>
>> 5 AND c.CID LIKE ( :B2);
>>
>> Elapsed: 00:00:00.04
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 2038584311
>>
>>
>>
>> -------------------------------------------------------------------------------------------
>>
>> | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
>> (%CPU)| Time |
>>
>>
>> -------------------------------------------------------------------------------------------
>>
>> | 0 | SELECT STATEMENT | | 33712 | 1349K| | 6916 (1)|
>> 00:01:23 |
>>
>> | 1 | HASH UNIQUE | | 33712 | 1349K| 1728K| 6916 (1)|
>> 00:01:23 |
>>
>> |* 2 | HASH JOIN | | 33712 | 1349K| | 6555 (1)|
>> 00:01:19 |
>>
>> |* 3 | INDEX FAST FULL SCAN| C_PK | 20096 | 274K| | 817 (1)|
>> 00:00:10 |
>>
>> | 4 | TABLE ACCESS FULL | MRC | 638K| 16M| | 5732 (1)|
>> 00:01:09 |
>>
>>
>> -------------------------------------------------------------------------------------------
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>> 2 - access("M"."AB_ID"="C"."AB_ID" AND
>>
>> "M"."CID"="C"."CID")
>>
>> 3 - filter("C"."CID" LIKE :B2)
>>
>>
>>
>> SQL> SELECT DISTINCT M_ID
>>
>> 2 FROM M, C
>>
>> 3 WHERE M.AB_ID = C.AB_ID
>>
>> 4 AND M.CID = C.CID
>>
>> 5 AND c.CID = ( :B2);
>>
>> Elapsed: 00:00:00.04
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 3581879523
>>
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | Id | Operation | Name | Rows | Bytes | Cost
>> (%CPU)| Time |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | 0 | SELECT STATEMENT | | 1 | 41
>> | 8 (13)| 00:00:01 |
>>
>> | 1 | HASH UNIQUE | | 1 | 41
>> | 8 (13)| 00:00:01 |
>>
>> | 2 | NESTED LOOPS | | 1 | 41
>> | 7 (0)| 00:00:01 |
>>
>> | 3 | NESTED LOOPS | | 1 | 41
>> | 7 (0)| 00:00:01 |
>>
>> |* 4 | INDEX SKIP SCAN | C_PK | 1 | 14
>> | 4 (0)| 00:00:01 |
>>
>> |* 5 | INDEX RANGE SCAN | MRC_IDX | 1
>> | | 2 (0)| 00:00:01 |
>>
>> | 6 | TABLE ACCESS BY INDEX ROWID| MRC | 1 | 27
>> | 3 (0)| 00:00:01 |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>> 4 - access("C"."CID"=:B2)
>>
>> filter("C"."CID"=:B2)
>>
>> 5 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"=:B2)
>>
>>
>> SQL> SELECT DISTINCT M_ID
>>
>> 2 FROM M, C
>>
>> 3 WHERE M.AB_ID = C.AB_ID
>>
>> 4 AND M.CID = C.CID
>>
>> 5 AND c.CID like '99999' ;
>>
>> Elapsed: 00:00:00.04
>>
>>
>> Execution Plan
>>
>> ----------------------------------------------------------
>>
>> Plan hash value: 3581879523
>>
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | Id | Operation | Name | Rows | Bytes | Cost
>> (%CPU)| Time |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>> | 0 | SELECT STATEMENT | | 2 | 82
>> | 8 (13)| 00:00:01 |
>>
>> | 1 | HASH UNIQUE | | 2 | 82
>> | 8 (13)| 00:00:01 |
>>
>> | 2 | NESTED LOOPS | | 2 | 82
>> | 7 (0)| 00:00:01 |
>>
>> | 3 | NESTED LOOPS | | 2 | 82
>> | 7 (0)| 00:00:01 |
>>
>> |* 4 | INDEX SKIP SCAN | C_PK | 1 | 14
>> | 4 (0)| 00:00:01 |
>>
>> |* 5 | INDEX RANGE SCAN | MRC_IDX | 1
>> | | 2 (0)| 00:00:01 |
>>
>> | 6 | TABLE ACCESS BY INDEX ROWID| MRC | 2 | 54
>> | 3 (0)| 00:00:01 |
>>
>>
>> ----------------------------------------------------------------------------------------------
>>
>>
>> Predicate Information (identified by operation id):
>>
>> ---------------------------------------------------
>>
>>
>> 4 - access("C"."CID"='99999')
>>
>> filter("C"."CID"='99999')
>>
>> 5 - access("M"."AB_ID"="C"."AB_ID" AND
>>
>> "M"."CID"="C"."CID")
>>
>> On Tue, Dec 1, 2020 at 3:40 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> Function-based index wouldn't help. (In fact there may already be such
>>> an index in place).
>>> The optimizer ignored function-based indexes on remote tables in
>>> distributed joins.
>>> https://jonathanlewis.wordpress.com/2018/05/04/fbi-limitation/
>>> The cardinality estimate for this operation might be one of the basic
>>> guesses (like 1% for "I haven't a clue but it's an equality")
>>>
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>> On Mon, 30 Nov 2020 at 20:46, Jeffrey Beckstrom <jbeckstrom_at_gcrta.org>
>>> wrote:
>>>
>>>> Your step 42 states:
>>>>
>>>> 42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>>> )
>>>>
>>>> Since you are doing a "upper" on the column, do you have a have a
>>>> function index on that column?
>>>>
>>>> Jeffrey Beckstrom
>>>> Lead Database Administrator
>>>> Information Technology Department
>>>> Greater Cleveland Regional Transit Authority
>>>> 1240 W. 6th Street
>>>> Cleveland, Ohio 44113
>>>>
>>>> >>> Lok P <loknath.73_at_gmail.com> 11/30/20 3:40 PM >>>
>>>> Basically I was thinking if oracle make the cardinality estimation by
>>>> reading the statistics information from the remote database objects during
>>>> parsing of the query only? Or it apply some percentage logic to have the
>>>> cardinality estimation of the remote object which then can deviate from
>>>> actuals and thus may resulted into bad plan?
>>>>
>>>> On Tue, 1 Dec 2020, 1:36 am Lok P, <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Its version 11.2.0.4 of oracle database. We have below query spending
>>>>> quite a lot of time on "SQL*Net message from dblink" on step-42. And it
>>>>> seems it's because it's scanning that table in FULL, hundreds of times
>>>>> which seems to be because of some bad estimation. My thought was if line
>>>>> number -25 in the plan i.e the HASH JOIN is causing the overall estimation
>>>>> wrong and thus the REMOTE_TAB is getting joined as "Nested loop outer" but
>>>>> not "Hash Join outer". And possibly REMOTE_TAB would have joined as "hash
>>>>> join outer" making it finish in quick time. Is my understanding correct
>>>>> here? OR The estimation of remote object cardinality is posing a problem
>>>>> for optimizers here and is the cause?
>>>>> The query is a UNION ALL query, i have shared the predicate section.
>>>>> And another thing is column NM_TXT of REMOTE_TAB is having an index on it,
>>>>> but because of the UPPER function it's not getting used. So probably
>>>>> creating a new function based index on REMOTE_TAB.UPPER(NM_TXT) will make
>>>>> the query faster even with NESTED LOOP OUTER path. But I want to see if we
>>>>> can fix any table/column stats , so that the optimizer can by default
>>>>> follow the HASh JOIN OUTER path? Also that table REMOTE_TAB has ~427K rows
>>>>> and almost all the rows have distinct NM_TXT values.
>>>>> Below is the sql monitor but its not formatted well so i also Attached
>>>>> the same sql monitor of the query , i have replaced the exact object names
>>>>> with dummy ones.
>>>>>
>>>>> Global Information
>>>>> ------------------------------
>>>>> Status : EXECUTING
>>>>> Instance ID : 1
>>>>> SQL Execution ID : 16777456
>>>>> Execution Started : 11/23/2020 00:40:07
>>>>> First Refresh Time : 11/23/2020 04:13:17
>>>>> Last Refresh Time : 11/23/2020 04:38:00
>>>>> Duration : 14327s
>>>>> Program : JDBC Thin Client
>>>>>
>>>>> Global Stats
>>>>>
>>>>> =================================================================================
>>>>> | Elapsed | Cpu | IO | Concurrency | Other | Buffer | Read | Read |
>>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Gets | Reqs |
>>>>> Bytes |
>>>>>
>>>>> =================================================================================
>>>>> | 14274 | 5.21 | 0.00 | 0.00 | 14268 | 67307 | 1 | 8192 |
>>>>>
>>>>> =================================================================================
>>>>>
>>>>> SQL Plan Monitoring Details (Plan Hash Value=1724585072)
>>>>>
>>>>> =============================================================================================================================================================================================
>>>>> | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows |
>>>>> Mem | Activity | Activity Detail |
>>>>> | | | | (Estim) | | Active(s) | Active | | (Actual) | | (%) | (#
>>>>> samples) |
>>>>>
>>>>> =============================================================================================================================================================================================
>>>>> | 0 | SELECT STATEMENT | | | | | | 1 | | | | |
>>>>> | -> 1 | SORT ORDER BY | | 2 | 20772 | 1484 | +12790 | 1 | 0 | 2M | | |
>>>>> | -> 2 | UNION-ALL | | | | 1484 | +12790 | 1 | 6887 | | | |
>>>>> | 3 | NESTED LOOPS OUTER | | 1 | 2272 | 1 | +12790 | 1 | 152 | | | |
>>>>> | 4 | NESTED LOOPS OUTER | | 1 | 11 | 1 | +12790 | 1 | 152 | | | |
>>>>> | 5 | NESTED LOOPS OUTER | | 1 | 10 | 1 | +12790 | 1 | 152 | | | |
>>>>> | 6 | NESTED LOOPS | | 1 | 8 | 1 | +12790 | 1 | 152 | | | |
>>>>> | 7 | NESTED LOOPS | | 1 | 6 | 1 | +12790 | 1 | 152 | | | |
>>>>> | 8 | TABLE ACCESS BY INDEX ROWID | TAB1 | 1 | 5 | 1 | +12790 | 1 | 1
>>>>> | | | |
>>>>> | 9 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>>>> | 10 | INDEX RANGE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 1 | 152 | |
>>>>> | |
>>>>> | 11 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 | 152
>>>>> | 152 | | | |
>>>>> | 12 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 152 | 152
>>>>> | | | |
>>>>> | 13 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 |
>>>>> 152 | 141 | | | |
>>>>> | 14 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 152 |
>>>>> 141 | | | |
>>>>> | 15 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 152 |
>>>>> 141 | | | |
>>>>> | 16 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 152 | 141 | |
>>>>> | |
>>>>> | 17 | REMOTE | REMOTE_TAB | 1 | 2261 | 1 | +12790 | 152 | 73 | | | |
>>>>> | -> 18 | NESTED LOOPS OUTER | | 1 | 18500 | 1484 | +12790 | 1 | 6735
>>>>> | | | |
>>>>> | -> 19 | HASH JOIN | | 1 | 16239 | 1484 | +12790 | 1 | 6736 | 2M | | |
>>>>> | 20 | NESTED LOOPS OUTER | | 2 | 14795 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 21 | NESTED LOOPS OUTER | | 2 | 14793 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 22 | HASH JOIN | | 2 | 14789 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 23 | NESTED LOOPS | | 2 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 24 | NESTED LOOPS | | 4 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 25 | HASH JOIN | | 4 | 8895 | 1 | +12790 | 1 | 8356 | | | |
>>>>> | 26 | VIEW | VW_NSO_1 | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>>> | 27 | HASH UNIQUE | | 2 | | 1 | +12790 | 1 | 15 | | | |
>>>>> | 28 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>>> | 29 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>>> | 30 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>>>> | 31 | INDEX RANGE SCAN | MRC_IDX | 1 | 2 | 1 | +12790 | 1 | 15 | | | |
>>>>> | 32 | TABLE ACCESS BY INDEX ROWID | MRC | 2 | 3 | 1 | +12790 | 15 |
>>>>> 15 | | | |
>>>>> | 33 | INDEX FULL SCAN | M_CTCT | 1M | 8862 | 1 | +12790 | 1 | 1M | |
>>>>> | |
>>>>> | -> 34 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1484 | +12790 | 8356
>>>>> | 8356 | | | |
>>>>> | 35 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 |
>>>>> 8356 | 8356 | | | |
>>>>> | 36 | TABLE ACCESS FULL | MRC | 639K | 5874 | 1 | +12790 | 1 | 641K |
>>>>> | | |
>>>>> | 37 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 |
>>>>> 8356 | 8356 | | | |
>>>>> | 38 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 8356 |
>>>>> 8356 | | | |
>>>>> | 39 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 8356
>>>>> | 8352 | | | |
>>>>> | 40 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 8356 | 8352 |
>>>>> | | |
>>>>> | -> 41 | TABLE ACCESS FULL | TAB1 | 394K | 1437 | 1484 | +12790 | 1 |
>>>>> 9033 | | | |
>>>>> | -> 42 | REMOTE | REMOTE_TAB | 1 | 2261 | 3302 | +11027 | 6736 | 379
>>>>> | | 100.00 | SQL*Net message from dblink (3248) |
>>>>>
>>>>> =============================================================================================================================================================================================
>>>>>
>>>>>
>>>>> Predicate Information (identified by operation id):
>>>>> ---------------------------------------------------
>>>>> 9 - access("C"."CID" LIKE :1) filter("C"."CID" LIKE :1)
>>>>> 10 - access("CC"."AB_ID"="C"."AB_ID" AND "CC"."CID"="C"."CID")
>>>>> 11 - filter(("GC"."M_CID"='C' AND "GC"."AB_ID" IS NOT NULL AND
>>>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="CC"."AB_ID"))
>>>>> 12 - access("GC"."CT_ID"="CC"."CT_ID")
>>>>> 13 - filter("GC"."AB_ID"="U"."AB_ID")
>>>>> 14 - access("GC"."CT_ID"="U"."CT_ID")
>>>>> 16 - access("U"."RID"="ROL"."RID")
>>>>> 19 - access("M"."CID"="C"."CID")
>>>>> 22 - access("MC"."M_ID"="M"."M_ID")
>>>>> 25 - access("MC"."M_ID"="M_ID")
>>>>> 30 - access("C"."CID" LIKE :2) filter("C"."CID" LIKE :2)
>>>>> 31 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"="C"."CID")
>>>>> 34 - access("GC"."CT_ID"="MC"."CT_ID")
>>>>> 35 - filter(("GC"."AB_ID" IS NOT NULL AND "GC"."M_CID"='M' AND
>>>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="MC"."AB_ID"))
>>>>> 37 - filter("GC"."AB_ID"="U"."AB_ID")
>>>>> 38 - access("GC"."CT_ID"="U"."CT_ID")
>>>>> 40 - access("U"."RID"="ROL"."RID")
>>>>>
>>>>> Remote SQL Information (identified by operation id):
>>>>> ----------------------------------------------------
>>>>> 17 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>>>> )
>>>>>
>>>>> 42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>>>> )
>>>>>
>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 02 2020 - 12:19:47 CET

Original text of this message