Re: [External] Remote database table cardinality estimation

From: Lok P <loknath.73_at_gmail.com>
Date: Tue, 1 Dec 2020 20:14:59 +0530
Message-ID: <CAKna9VYHJgPOXjFDXbo1ujDRPhkAfDV-LK_4ykz5cbE=OqhgAQ_at_mail.gmail.com>



The DISTINCT M_ID subquery estimation is deviating from ~2 estimated rows VS ~15 actual rows, So I am thinking if by some way we can make the estimation of that HASH JOIN at Line-25 accurate will help us here or the only option would be to consider adding the remote table REMOTE_TAB alias "CT" as HASH JOIN hint i.e. (USE_HASH(CT)) ?

On Tue, Dec 1, 2020 at 2:24 PM Lok P <loknath.73_at_gmail.com> wrote:

> Thank You very much.
>
> As you mentioned the function based index is not going to help(and also i
> verified we don't have any function based index rather a simple btree index
> on column NM_TXT), so basically the current "NESTED LOOP OUTER" path for
> scanning REMOTE_TAB is not going to help us here. And driving_site hint may
> not help as we have a lot of big tables that get scanned/fetched are part
> of the local database here in this query. So what would be the possible way
> to fix this query here?
>
> I am not able to fully confirm if its the estimation of the REMOTE_TAB is causing
> the issue OR the HASH join at plan_line_id - 25 is the cause of the bad
> path. As the remote_tab is holding ~400k rows and all of them have unique
> NM_TXT and the estimation shows as ~1 (which is the minimum possible values
> Oracle shows in the plan). But anyway, as i read your blogs, it says the
> optimizer won't be using column histogram info which going through DB
> link, (and i belive that means optimizer can use all other stats like
> num_rows, num_distinct, num_nulls etc from the remote database for
> evaluating execution plan) and in our case that column NM_TXT is all unique
> , so i believe that issue should not affect us here. Please correct me if
> I am wrong.
>
> Below is the second part of the UNION ALL from which the plan-line ~42 is
> coming up. I have removed the exact object names and just kept the aliases, in
> the below query. it uses a bind variable :b2 which is coming as a string
> from Java and its coming as full string like '9999999' and no wild search
> is happening here (say like '%999999%'), so not sure why the 'LIKE'
> operator is used here, as it looks to me, it can be replaced by an equal
> operator. But then I was trying to see the estimation of that subquery
> part which is performing distinct M_ID as that is one of the inputs to
> the HASH join of plan line id-25.
>
> I see with the Like operator with bind variable, the estimation goes way
> higher, as i believe optimizer is doing it based on wild character search.
> But with an equal operator it's coming as '1' and with literal the
> estimation is coming as '2'. We do use bind variable here, but I think
> optimizer is peeking at it at the run time and deciding the estimation as
> "2". But I verified the actual count from this subquery is coming ~15. I
> am not sure how I can i fix this one , and if it's really going to put
> the table REMOTE_TAB in "HASH JOIn OUTER" irrespective of "NESTED LOOP
> OUTER"?
>
>
> SELECT ...
>
> FROM u, gc, ct, rol, mc, c, m
>
> WHERE gc.CT_ID = u.CT_ID(+)
>
> AND gc.AB_ID = u.AB_ID(+)
>
> AND u.NM_TXT = UPPER (ct.NM_TXT(+))
>
> AND u.RID = rol.RID(+)
>
> AND gc.M_CID = 'M'
>
> AND gc.C_TYP = 'X'
>
> AND gc.AB_ID = mc.AB_ID
>
> AND gc.CT_ID = mc.CT_ID
>
> AND mc.M_ID = m.M_ID
>
> AND m.CID = c.CID
>
> AND mc.M_ID IN
>
> (SELECT DISTINCT M_ID
>
> FROM M, C
>
> WHERE M.AB_ID = C.AB_ID
>
> AND M.CID = C.CID
>
> AND c.CID LIKE ( :B2)
>
> )
>
> ORDER BY 1;
>
>
> SQL> SELECT DISTINCT M_ID
>
> 2 FROM M, C
>
> 3 WHERE M.AB_ID = C.AB_ID
>
> 4 AND M.CID = C.CID
>
> 5 AND c.CID LIKE ( :B2);
>
> Elapsed: 00:00:00.04
>
>
> Execution Plan
>
> ----------------------------------------------------------
>
> Plan hash value: 2038584311
>
>
>
> -------------------------------------------------------------------------------------------
>
> | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> (%CPU)| Time |
>
>
> -------------------------------------------------------------------------------------------
>
> | 0 | SELECT STATEMENT | | 33712 | 1349K| | 6916 (1)|
> 00:01:23 |
>
> | 1 | HASH UNIQUE | | 33712 | 1349K| 1728K| 6916 (1)|
> 00:01:23 |
>
> |* 2 | HASH JOIN | | 33712 | 1349K| | 6555 (1)|
> 00:01:19 |
>
> |* 3 | INDEX FAST FULL SCAN| C_PK | 20096 | 274K| | 817 (1)|
> 00:00:10 |
>
> | 4 | TABLE ACCESS FULL | MRC | 638K| 16M| | 5732 (1)|
> 00:01:09 |
>
>
> -------------------------------------------------------------------------------------------
>
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>
> 2 - access("M"."AB_ID"="C"."AB_ID" AND
>
> "M"."CID"="C"."CID")
>
> 3 - filter("C"."CID" LIKE :B2)
>
>
>
> SQL> SELECT DISTINCT M_ID
>
> 2 FROM M, C
>
> 3 WHERE M.AB_ID = C.AB_ID
>
> 4 AND M.CID = C.CID
>
> 5 AND c.CID = ( :B2);
>
> Elapsed: 00:00:00.04
>
>
> Execution Plan
>
> ----------------------------------------------------------
>
> Plan hash value: 3581879523
>
>
>
> ----------------------------------------------------------------------------------------------
>
> | Id | Operation | Name | Rows | Bytes | Cost
> (%CPU)| Time |
>
>
> ----------------------------------------------------------------------------------------------
>
> | 0 | SELECT STATEMENT | | 1 | 41
> | 8 (13)| 00:00:01 |
>
> | 1 | HASH UNIQUE | | 1 | 41
> | 8 (13)| 00:00:01 |
>
> | 2 | NESTED LOOPS | | 1 | 41
> | 7 (0)| 00:00:01 |
>
> | 3 | NESTED LOOPS | | 1 | 41
> | 7 (0)| 00:00:01 |
>
> |* 4 | INDEX SKIP SCAN | C_PK | 1 | 14
> | 4 (0)| 00:00:01 |
>
> |* 5 | INDEX RANGE SCAN | MRC_IDX | 1
> | | 2 (0)| 00:00:01 |
>
> | 6 | TABLE ACCESS BY INDEX ROWID| MRC | 1 | 27
> | 3 (0)| 00:00:01 |
>
>
> ----------------------------------------------------------------------------------------------
>
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>
> 4 - access("C"."CID"=:B2)
>
> filter("C"."CID"=:B2)
>
> 5 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"=:B2)
>
>
> SQL> SELECT DISTINCT M_ID
>
> 2 FROM M, C
>
> 3 WHERE M.AB_ID = C.AB_ID
>
> 4 AND M.CID = C.CID
>
> 5 AND c.CID like '99999' ;
>
> Elapsed: 00:00:00.04
>
>
> Execution Plan
>
> ----------------------------------------------------------
>
> Plan hash value: 3581879523
>
>
>
> ----------------------------------------------------------------------------------------------
>
> | Id | Operation | Name | Rows | Bytes | Cost
> (%CPU)| Time |
>
>
> ----------------------------------------------------------------------------------------------
>
> | 0 | SELECT STATEMENT | | 2 | 82
> | 8 (13)| 00:00:01 |
>
> | 1 | HASH UNIQUE | | 2 | 82
> | 8 (13)| 00:00:01 |
>
> | 2 | NESTED LOOPS | | 2 | 82
> | 7 (0)| 00:00:01 |
>
> | 3 | NESTED LOOPS | | 2 | 82
> | 7 (0)| 00:00:01 |
>
> |* 4 | INDEX SKIP SCAN | C_PK | 1 | 14
> | 4 (0)| 00:00:01 |
>
> |* 5 | INDEX RANGE SCAN | MRC_IDX | 1
> | | 2 (0)| 00:00:01 |
>
> | 6 | TABLE ACCESS BY INDEX ROWID| MRC | 2 | 54
> | 3 (0)| 00:00:01 |
>
>
> ----------------------------------------------------------------------------------------------
>
>
> Predicate Information (identified by operation id):
>
> ---------------------------------------------------
>
>
> 4 - access("C"."CID"='99999')
>
> filter("C"."CID"='99999')
>
> 5 - access("M"."AB_ID"="C"."AB_ID" AND
>
> "M"."CID"="C"."CID")
>
> On Tue, Dec 1, 2020 at 3:40 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> Function-based index wouldn't help. (In fact there may already be such
>> an index in place).
>> The optimizer ignored function-based indexes on remote tables in
>> distributed joins.
>> https://jonathanlewis.wordpress.com/2018/05/04/fbi-limitation/
>> The cardinality estimate for this operation might be one of the basic
>> guesses (like 1% for "I haven't a clue but it's an equality")
>>
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Mon, 30 Nov 2020 at 20:46, Jeffrey Beckstrom <jbeckstrom_at_gcrta.org>
>> wrote:
>>
>>> Your step 42 states:
>>>
>>> 42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>> )
>>>
>>> Since you are doing a "upper" on the column, do you have a have a
>>> function index on that column?
>>>
>>> Jeffrey Beckstrom
>>> Lead Database Administrator
>>> Information Technology Department
>>> Greater Cleveland Regional Transit Authority
>>> 1240 W. 6th Street
>>> Cleveland, Ohio 44113
>>>
>>> >>> Lok P <loknath.73_at_gmail.com> 11/30/20 3:40 PM >>>
>>> Basically I was thinking if oracle make the cardinality estimation by
>>> reading the statistics information from the remote database objects during
>>> parsing of the query only? Or it apply some percentage logic to have the
>>> cardinality estimation of the remote object which then can deviate from
>>> actuals and thus may resulted into bad plan?
>>>
>>> On Tue, 1 Dec 2020, 1:36 am Lok P, <loknath.73_at_gmail.com> wrote:
>>>
>>>> Its version 11.2.0.4 of oracle database. We have below query spending
>>>> quite a lot of time on "SQL*Net message from dblink" on step-42. And it
>>>> seems it's because it's scanning that table in FULL, hundreds of times
>>>> which seems to be because of some bad estimation. My thought was if line
>>>> number -25 in the plan i.e the HASH JOIN is causing the overall estimation
>>>> wrong and thus the REMOTE_TAB is getting joined as "Nested loop outer" but
>>>> not "Hash Join outer". And possibly REMOTE_TAB would have joined as "hash
>>>> join outer" making it finish in quick time. Is my understanding correct
>>>> here? OR The estimation of remote object cardinality is posing a problem
>>>> for optimizers here and is the cause?
>>>> The query is a UNION ALL query, i have shared the predicate section.
>>>> And another thing is column NM_TXT of REMOTE_TAB is having an index on it,
>>>> but because of the UPPER function it's not getting used. So probably
>>>> creating a new function based index on REMOTE_TAB.UPPER(NM_TXT) will make
>>>> the query faster even with NESTED LOOP OUTER path. But I want to see if we
>>>> can fix any table/column stats , so that the optimizer can by default
>>>> follow the HASh JOIN OUTER path? Also that table REMOTE_TAB has ~427K rows
>>>> and almost all the rows have distinct NM_TXT values.
>>>> Below is the sql monitor but its not formatted well so i also Attached
>>>> the same sql monitor of the query , i have replaced the exact object names
>>>> with dummy ones.
>>>>
>>>> Global Information
>>>> ------------------------------
>>>> Status : EXECUTING
>>>> Instance ID : 1
>>>> SQL Execution ID : 16777456
>>>> Execution Started : 11/23/2020 00:40:07
>>>> First Refresh Time : 11/23/2020 04:13:17
>>>> Last Refresh Time : 11/23/2020 04:38:00
>>>> Duration : 14327s
>>>> Program : JDBC Thin Client
>>>>
>>>> Global Stats
>>>>
>>>> =================================================================================
>>>> | Elapsed | Cpu | IO | Concurrency | Other | Buffer | Read | Read |
>>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Gets | Reqs |
>>>> Bytes |
>>>>
>>>> =================================================================================
>>>> | 14274 | 5.21 | 0.00 | 0.00 | 14268 | 67307 | 1 | 8192 |
>>>>
>>>> =================================================================================
>>>>
>>>> SQL Plan Monitoring Details (Plan Hash Value=1724585072)
>>>>
>>>> =============================================================================================================================================================================================
>>>> | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows |
>>>> Mem | Activity | Activity Detail |
>>>> | | | | (Estim) | | Active(s) | Active | | (Actual) | | (%) | (#
>>>> samples) |
>>>>
>>>> =============================================================================================================================================================================================
>>>> | 0 | SELECT STATEMENT | | | | | | 1 | | | | |
>>>> | -> 1 | SORT ORDER BY | | 2 | 20772 | 1484 | +12790 | 1 | 0 | 2M | | |
>>>> | -> 2 | UNION-ALL | | | | 1484 | +12790 | 1 | 6887 | | | |
>>>> | 3 | NESTED LOOPS OUTER | | 1 | 2272 | 1 | +12790 | 1 | 152 | | | |
>>>> | 4 | NESTED LOOPS OUTER | | 1 | 11 | 1 | +12790 | 1 | 152 | | | |
>>>> | 5 | NESTED LOOPS OUTER | | 1 | 10 | 1 | +12790 | 1 | 152 | | | |
>>>> | 6 | NESTED LOOPS | | 1 | 8 | 1 | +12790 | 1 | 152 | | | |
>>>> | 7 | NESTED LOOPS | | 1 | 6 | 1 | +12790 | 1 | 152 | | | |
>>>> | 8 | TABLE ACCESS BY INDEX ROWID | TAB1 | 1 | 5 | 1 | +12790 | 1 | 1 |
>>>> | | |
>>>> | 9 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>>> | 10 | INDEX RANGE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 1 | 152 | |
>>>> | |
>>>> | 11 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 | 152
>>>> | 152 | | | |
>>>> | 12 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 152 | 152 |
>>>> | | |
>>>> | 13 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 | 152
>>>> | 141 | | | |
>>>> | 14 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 152 | 141
>>>> | | | |
>>>> | 15 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 152 |
>>>> 141 | | | |
>>>> | 16 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 152 | 141 | | |
>>>> |
>>>> | 17 | REMOTE | REMOTE_TAB | 1 | 2261 | 1 | +12790 | 152 | 73 | | | |
>>>> | -> 18 | NESTED LOOPS OUTER | | 1 | 18500 | 1484 | +12790 | 1 | 6735 |
>>>> | | |
>>>> | -> 19 | HASH JOIN | | 1 | 16239 | 1484 | +12790 | 1 | 6736 | 2M | | |
>>>> | 20 | NESTED LOOPS OUTER | | 2 | 14795 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 21 | NESTED LOOPS OUTER | | 2 | 14793 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 22 | HASH JOIN | | 2 | 14789 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 23 | NESTED LOOPS | | 2 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 24 | NESTED LOOPS | | 4 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 25 | HASH JOIN | | 4 | 8895 | 1 | +12790 | 1 | 8356 | | | |
>>>> | 26 | VIEW | VW_NSO_1 | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>> | 27 | HASH UNIQUE | | 2 | | 1 | +12790 | 1 | 15 | | | |
>>>> | 28 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>> | 29 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>>> | 30 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>>> | 31 | INDEX RANGE SCAN | MRC_IDX | 1 | 2 | 1 | +12790 | 1 | 15 | | | |
>>>> | 32 | TABLE ACCESS BY INDEX ROWID | MRC | 2 | 3 | 1 | +12790 | 15 | 15
>>>> | | | |
>>>> | 33 | INDEX FULL SCAN | M_CTCT | 1M | 8862 | 1 | +12790 | 1 | 1M | | |
>>>> |
>>>> | -> 34 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1484 | +12790 | 8356
>>>> | 8356 | | | |
>>>> | 35 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 | 8356
>>>> | 8356 | | | |
>>>> | 36 | TABLE ACCESS FULL | MRC | 639K | 5874 | 1 | +12790 | 1 | 641K |
>>>> | | |
>>>> | 37 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 |
>>>> 8356 | 8356 | | | |
>>>> | 38 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 8356 |
>>>> 8356 | | | |
>>>> | 39 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 8356 |
>>>> 8352 | | | |
>>>> | 40 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 8356 | 8352 | |
>>>> | |
>>>> | -> 41 | TABLE ACCESS FULL | TAB1 | 394K | 1437 | 1484 | +12790 | 1 |
>>>> 9033 | | | |
>>>> | -> 42 | REMOTE | REMOTE_TAB | 1 | 2261 | 3302 | +11027 | 6736 | 379 |
>>>> | 100.00 | SQL*Net message from dblink (3248) |
>>>>
>>>> =============================================================================================================================================================================================
>>>>
>>>>
>>>> Predicate Information (identified by operation id):
>>>> ---------------------------------------------------
>>>> 9 - access("C"."CID" LIKE :1) filter("C"."CID" LIKE :1)
>>>> 10 - access("CC"."AB_ID"="C"."AB_ID" AND "CC"."CID"="C"."CID")
>>>> 11 - filter(("GC"."M_CID"='C' AND "GC"."AB_ID" IS NOT NULL AND
>>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="CC"."AB_ID"))
>>>> 12 - access("GC"."CT_ID"="CC"."CT_ID")
>>>> 13 - filter("GC"."AB_ID"="U"."AB_ID")
>>>> 14 - access("GC"."CT_ID"="U"."CT_ID")
>>>> 16 - access("U"."RID"="ROL"."RID")
>>>> 19 - access("M"."CID"="C"."CID")
>>>> 22 - access("MC"."M_ID"="M"."M_ID")
>>>> 25 - access("MC"."M_ID"="M_ID")
>>>> 30 - access("C"."CID" LIKE :2) filter("C"."CID" LIKE :2)
>>>> 31 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"="C"."CID")
>>>> 34 - access("GC"."CT_ID"="MC"."CT_ID")
>>>> 35 - filter(("GC"."AB_ID" IS NOT NULL AND "GC"."M_CID"='M' AND
>>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="MC"."AB_ID"))
>>>> 37 - filter("GC"."AB_ID"="U"."AB_ID")
>>>> 38 - access("GC"."CT_ID"="U"."CT_ID")
>>>> 40 - access("U"."RID"="ROL"."RID")
>>>>
>>>> Remote SQL Information (identified by operation id):
>>>> ----------------------------------------------------
>>>> 17 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>>> )
>>>>
>>>> 42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>>> )
>>>>
>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Dec 01 2020 - 15:44:59 CET

Original text of this message