Re: [External] Remote database table cardinality estimation

From: Lok P <loknath.73_at_gmail.com>
Date: Tue, 1 Dec 2020 14:24:40 +0530
Message-ID: <CAKna9VbE5oGXvg5ajinYyU8bykUf+uTUk3cdRXmSkt2_y=2MMg_at_mail.gmail.com>



Thank You very much.

As you mentioned the function based index is not going to help(and also i verified we don't have any function based index rather a simple btree index on column NM_TXT), so basically the current "NESTED LOOP OUTER" path for scanning REMOTE_TAB is not going to help us here. And driving_site hint may not help as we have a lot of big tables that get scanned/fetched are part of the local database here in this query. So what would be the possible way to fix this query here?

I am not able to fully confirm if its the estimation of the REMOTE_TAB is causing
the issue OR the HASH join at plan_line_id - 25 is the cause of the bad path. As the remote_tab is holding ~400k rows and all of them have unique NM_TXT and the estimation shows as ~1 (which is the minimum possible values Oracle shows in the plan). But anyway, as i read your blogs, it says the optimizer won't be using column histogram info which going through DB link, (and i belive that means optimizer can use all other stats like num_rows, num_distinct, num_nulls etc from the remote database for evaluating execution plan) and in our case that column NM_TXT is all unique , so i believe that issue should not affect us here. Please correct me if I am wrong.

Below is the second part of the UNION ALL from which the plan-line ~42 is coming up. I have removed the exact object names and just kept the aliases, in the below query. it uses a bind variable :b2 which is coming as a string from Java and its coming as full string like '9999999' and no wild search is happening here (say like '%999999%'), so not sure why the 'LIKE' operator is used here, as it looks to me, it can be replaced by an equal operator. But then I was trying to see the estimation of that subquery part which is performing distinct M_ID as that is one of the inputs to the HASH join of plan line id-25.

I see with the Like operator with bind variable, the estimation goes way higher, as i believe optimizer is doing it based on wild character search. But with an equal operator it's coming as '1' and with literal the estimation is coming as '2'. We do use bind variable here, but I think optimizer is peeking at it at the run time and deciding the estimation as "2". But I verified the actual count from this subquery is coming ~15. I am not sure how I can i fix this one , and if it's really going to put the table REMOTE_TAB in "HASH JOIn OUTER" irrespective of "NESTED LOOP OUTER"?

SELECT ...  FROM u, gc, ct, rol, mc, c, m

 WHERE gc.CT_ID = u.CT_ID(+)

      AND gc.AB_ID = u.AB_ID(+)

      AND u.NM_TXT = UPPER (ct.NM_TXT(+))

      AND u.RID = rol.RID(+)

      AND gc.M_CID = 'M'

      AND gc.C_TYP = 'X'

      AND gc.AB_ID = mc.AB_ID

      AND gc.CT_ID = mc.CT_ID

      AND mc.M_ID = m.M_ID

      AND m.CID = c.CID

      AND mc.M_ID IN

             (SELECT DISTINCT M_ID

                FROM M, C

               WHERE    M.AB_ID = C.AB_ID

                     AND M.CID = C.CID

                     AND c.CID LIKE ( :B2)

                )

ORDER BY 1; SQL> SELECT DISTINCT M_ID

 2                  FROM M, C

 3                 WHERE    M.AB_ID = C.AB_ID

 4                       AND M.CID = C.CID

 5                       AND c.CID LIKE ( :B2);

Elapsed: 00:00:00.04

Execution Plan


Plan hash value: 2038584311


| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | 33712 | 1349K| | 6916 (1)| 00:01:23 |

| 1 | HASH UNIQUE | | 33712 | 1349K| 1728K| 6916 (1)| 00:01:23 |

|* 2 | HASH JOIN | | 33712 | 1349K| | 6555 (1)| 00:01:19 |

|* 3 | INDEX FAST FULL SCAN| C_PK | 20096 | 274K| | 817 (1)| 00:00:10 |

| 4 | TABLE ACCESS FULL | MRC | 638K| 16M| | 5732 (1)| 00:01:09 |


Predicate Information (identified by operation id):


  2 - access("M"."AB_ID"="C"."AB_ID" AND

             "M"."CID"="C"."CID")   3 - filter("C"."CID" LIKE :B2)

  SQL> SELECT DISTINCT M_ID

 2                  FROM M, C

 3                 WHERE    M.AB_ID = C.AB_ID

 4                       AND M.CID = C.CID

 5                       AND c.CID = ( :B2);

Elapsed: 00:00:00.04

Execution Plan


Plan hash value: 3581879523


| Id | Operation                    | Name        | Rows | Bytes | Cost
(%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 41 | 8 (13)| 00:00:01 |

|  1 | HASH UNIQUE                 |             |    1 |   41 |    8 (13)|
00:00:01 |

| 2 | NESTED LOOPS | | 1 | 41 | 7 (0)| 00:00:01 |

| 3 | NESTED LOOPS | | 1 | 41 | 7 (0)| 00:00:01 |

|* 4 | INDEX SKIP SCAN | C_PK | 1 | 14 | 4 (0)| 00:00:01 |

|* 5 | INDEX RANGE SCAN | MRC_IDX | 1 | | 2 (0)| 00:00:01 |

| 6 | TABLE ACCESS BY INDEX ROWID| MRC | 1 | 27 | 3 (0)| 00:00:01 |


Predicate Information (identified by operation id):


  4 - access("C"."CID"=:B2)

      filter("C"."CID"=:B2)

  5 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"=:B2)

SQL> SELECT DISTINCT M_ID

 2                  FROM M, C

 3                 WHERE    M.AB_ID = C.AB_ID

 4                       AND M.CID = C.CID

 5                       AND c.CID like '99999' ;

Elapsed: 00:00:00.04

Execution Plan


Plan hash value: 3581879523


| Id | Operation                    | Name        | Rows | Bytes | Cost
(%CPU)| Time |

| 0 | SELECT STATEMENT | | 2 | 82 | 8 (13)| 00:00:01 |

|  1 | HASH UNIQUE                 |             |    2 |   82 |    8 (13)|
00:00:01 |

| 2 | NESTED LOOPS | | 2 | 82 | 7 (0)| 00:00:01 |

| 3 | NESTED LOOPS | | 2 | 82 | 7 (0)| 00:00:01 |

|* 4 | INDEX SKIP SCAN | C_PK | 1 | 14 | 4 (0)| 00:00:01 |

|* 5 | INDEX RANGE SCAN | MRC_IDX | 1 | | 2 (0)| 00:00:01 |

| 6 | TABLE ACCESS BY INDEX ROWID| MRC | 2 | 54 | 3 (0)| 00:00:01 |


Predicate Information (identified by operation id):


  4 - access("C"."CID"='99999')

      filter("C"."CID"='99999')

  5 - access("M"."AB_ID"="C"."AB_ID" AND

             "M"."CID"="C"."CID") On Tue, Dec 1, 2020 at 3:40 AM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
>  Function-based index wouldn't help. (In fact there may already be such an
> index in place).
> The optimizer ignored function-based indexes on remote tables in
> distributed joins.
> https://jonathanlewis.wordpress.com/2018/05/04/fbi-limitation/
> The cardinality estimate for this operation might be one of the basic
> guesses (like 1% for "I haven't a clue but it's an equality")
>
>
> Regards
> Jonathan Lewis
>
>
> On Mon, 30 Nov 2020 at 20:46, Jeffrey Beckstrom <jbeckstrom_at_gcrta.org>
> wrote:
>
>> Your step 42 states:
>>
>>   42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>> )
>>
>> Since you are doing a "upper" on the column, do you have a have a
>> function index on that column?
>>
>> Jeffrey Beckstrom
>> Lead Database Administrator
>> Information Technology Department
>> Greater Cleveland Regional Transit Authority
>> 1240 W. 6th Street
>> Cleveland, Ohio 44113
>>
>> >>> Lok P <loknath.73_at_gmail.com> 11/30/20 3:40 PM >>>
>> Basically I was thinking if oracle make the cardinality estimation by
>> reading the statistics information from the remote database objects during
>> parsing of the query only? Or it apply some percentage logic to have the
>> cardinality estimation of the remote object which then can deviate from
>> actuals and thus may resulted into bad plan?
>>
>> On Tue, 1 Dec 2020, 1:36 am Lok P, <loknath.73_at_gmail.com> wrote:
>>
>>> Its version 11.2.0.4 of oracle database. We have below query spending
>>> quite a lot of time on "SQL*Net message from dblink" on step-42. And it
>>> seems it's because it's scanning that table in FULL, hundreds of times
>>> which seems to be because of some bad estimation. My thought was if line
>>> number -25 in the plan i.e the HASH JOIN is causing the overall estimation
>>> wrong and thus the REMOTE_TAB is getting joined as "Nested loop outer" but
>>> not "Hash Join outer". And possibly REMOTE_TAB would have joined as "hash
>>> join outer" making it finish in quick time. Is my understanding correct
>>> here? OR The estimation of remote object cardinality is posing a problem
>>> for optimizers here and is the cause?
>>> The query is a UNION ALL query, i have shared the predicate section. And
>>> another thing is column NM_TXT of REMOTE_TAB is having an index on it, but
>>> because of the UPPER function it's not getting used. So probably creating a
>>> new function based index on REMOTE_TAB.UPPER(NM_TXT) will make the query
>>> faster even with NESTED LOOP OUTER path. But I want to see if we can fix
>>> any table/column stats , so that the optimizer can by default follow the
>>> HASh JOIN OUTER path? Also that table REMOTE_TAB has ~427K rows and almost
>>> all the rows have distinct NM_TXT values.
>>> Below is the sql monitor but its not formatted well so i also Attached
>>> the same sql monitor of the query , i have replaced the exact object names
>>> with dummy ones.
>>>
>>> Global Information
>>> ------------------------------
>>> Status : EXECUTING
>>> Instance ID : 1
>>> SQL Execution ID : 16777456
>>> Execution Started : 11/23/2020 00:40:07
>>> First Refresh Time : 11/23/2020 04:13:17
>>> Last Refresh Time : 11/23/2020 04:38:00
>>> Duration : 14327s
>>> Program : JDBC Thin Client
>>>
>>> Global Stats
>>>
>>> =================================================================================
>>> | Elapsed | Cpu | IO | Concurrency | Other | Buffer | Read | Read |
>>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Gets | Reqs |
>>> Bytes |
>>>
>>> =================================================================================
>>> | 14274 | 5.21 | 0.00 | 0.00 | 14268 | 67307 | 1 | 8192 |
>>>
>>> =================================================================================
>>>
>>> SQL Plan Monitoring Details (Plan Hash Value=1724585072)
>>>
>>> =============================================================================================================================================================================================
>>> | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows |
>>> Mem | Activity | Activity Detail |
>>> | | | | (Estim) | | Active(s) | Active | | (Actual) | | (%) | (#
>>> samples) |
>>>
>>> =============================================================================================================================================================================================
>>> | 0 | SELECT STATEMENT | | | | | | 1 | | | | |
>>> | -> 1 | SORT ORDER BY | | 2 | 20772 | 1484 | +12790 | 1 | 0 | 2M | | |
>>> | -> 2 | UNION-ALL | | | | 1484 | +12790 | 1 | 6887 | | | |
>>> | 3 | NESTED LOOPS OUTER | | 1 | 2272 | 1 | +12790 | 1 | 152 | | | |
>>> | 4 | NESTED LOOPS OUTER | | 1 | 11 | 1 | +12790 | 1 | 152 | | | |
>>> | 5 | NESTED LOOPS OUTER | | 1 | 10 | 1 | +12790 | 1 | 152 | | | |
>>> | 6 | NESTED LOOPS | | 1 | 8 | 1 | +12790 | 1 | 152 | | | |
>>> | 7 | NESTED LOOPS | | 1 | 6 | 1 | +12790 | 1 | 152 | | | |
>>> | 8 | TABLE ACCESS BY INDEX ROWID | TAB1 | 1 | 5 | 1 | +12790 | 1 | 1 |
>>> | | |
>>> | 9 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>> | 10 | INDEX RANGE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 1 | 152 | | |
>>> |
>>> | 11 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 | 152 |
>>> 152 | | | |
>>> | 12 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1 | +12790 | 152 | 152 |
>>> | | |
>>> | 13 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 | 152
>>> | 141 | | | |
>>> | 14 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 152 | 141
>>> | | | |
>>> | 15 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 152 |
>>> 141 | | | |
>>> | 16 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 152 | 141 | | | |
>>> | 17 | REMOTE | REMOTE_TAB | 1 | 2261 | 1 | +12790 | 152 | 73 | | | |
>>> | -> 18 | NESTED LOOPS OUTER | | 1 | 18500 | 1484 | +12790 | 1 | 6735 |
>>> | | |
>>> | -> 19 | HASH JOIN | | 1 | 16239 | 1484 | +12790 | 1 | 6736 | 2M | | |
>>> | 20 | NESTED LOOPS OUTER | | 2 | 14795 | 1 | +12790 | 1 | 8356 | | | |
>>> | 21 | NESTED LOOPS OUTER | | 2 | 14793 | 1 | +12790 | 1 | 8356 | | | |
>>> | 22 | HASH JOIN | | 2 | 14789 | 1 | +12790 | 1 | 8356 | | | |
>>> | 23 | NESTED LOOPS | | 2 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>> | 24 | NESTED LOOPS | | 4 | 8903 | 1 | +12790 | 1 | 8356 | | | |
>>> | 25 | HASH JOIN | | 4 | 8895 | 1 | +12790 | 1 | 8356 | | | |
>>> | 26 | VIEW | VW_NSO_1 | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>> | 27 | HASH UNIQUE | | 2 | | 1 | +12790 | 1 | 15 | | | |
>>> | 28 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>> | 29 | NESTED LOOPS | | 2 | 7 | 1 | +12790 | 1 | 15 | | | |
>>> | 30 | INDEX SKIP SCAN | TAB1_PK | 1 | 4 | 1 | +12790 | 1 | 1 | | | |
>>> | 31 | INDEX RANGE SCAN | MRC_IDX | 1 | 2 | 1 | +12790 | 1 | 15 | | | |
>>> | 32 | TABLE ACCESS BY INDEX ROWID | MRC | 2 | 3 | 1 | +12790 | 15 | 15
>>> | | | |
>>> | 33 | INDEX FULL SCAN | M_CTCT | 1M | 8862 | 1 | +12790 | 1 | 1M | | | |
>>> | -> 34 | INDEX UNIQUE SCAN | G_CTCT_PK | 1 | 1 | 1484 | +12790 | 8356 |
>>> 8356 | | | |
>>> | 35 | TABLE ACCESS BY INDEX ROWID | G_CTCT | 1 | 2 | 1 | +12790 | 8356
>>> | 8356 | | | |
>>> | 36 | TABLE ACCESS FULL | MRC | 639K | 5874 | 1 | +12790 | 1 | 641K | |
>>> | |
>>> | 37 | TABLE ACCESS BY INDEX ROWID | PU_PRFL | 1 | 2 | 1 | +12790 | 8356
>>> | 8356 | | | |
>>> | 38 | INDEX UNIQUE SCAN | PU_PRFL_IDX | 1 | 1 | 1 | +12790 | 8356 |
>>> 8356 | | | |
>>> | 39 | TABLE ACCESS BY INDEX ROWID | P_RL | 1 | 1 | 1 | +12790 | 8356 |
>>> 8352 | | | |
>>> | 40 | INDEX UNIQUE SCAN | P_RL_PK | 1 | | 1 | +12790 | 8356 | 8352 | |
>>> | |
>>> | -> 41 | TABLE ACCESS FULL | TAB1 | 394K | 1437 | 1484 | +12790 | 1 |
>>> 9033 | | | |
>>> | -> 42 | REMOTE | REMOTE_TAB | 1 | 2261 | 3302 | +11027 | 6736 | 379 |
>>> | 100.00 | SQL*Net message from dblink (3248) |
>>>
>>> =============================================================================================================================================================================================
>>>
>>>
>>> Predicate Information (identified by operation id):
>>> ---------------------------------------------------
>>> 9 - access("C"."CID" LIKE :1) filter("C"."CID" LIKE :1)
>>> 10 - access("CC"."AB_ID"="C"."AB_ID" AND "CC"."CID"="C"."CID")
>>> 11 - filter(("GC"."M_CID"='C' AND "GC"."AB_ID" IS NOT NULL AND
>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="CC"."AB_ID"))
>>> 12 - access("GC"."CT_ID"="CC"."CT_ID")
>>> 13 - filter("GC"."AB_ID"="U"."AB_ID")
>>> 14 - access("GC"."CT_ID"="U"."CT_ID")
>>> 16 - access("U"."RID"="ROL"."RID")
>>> 19 - access("M"."CID"="C"."CID")
>>> 22 - access("MC"."M_ID"="M"."M_ID")
>>> 25 - access("MC"."M_ID"="M_ID")
>>> 30 - access("C"."CID" LIKE :2) filter("C"."CID" LIKE :2)
>>> 31 - access("M"."AB_ID"="C"."AB_ID" AND "M"."CID"="C"."CID")
>>> 34 - access("GC"."CT_ID"="MC"."CT_ID")
>>> 35 - filter(("GC"."AB_ID" IS NOT NULL AND "GC"."M_CID"='M' AND
>>> "GC"."C_TYP"='X' AND "GC"."AB_ID"="MC"."AB_ID"))
>>> 37 - filter("GC"."AB_ID"="U"."AB_ID")
>>> 38 - access("GC"."CT_ID"="U"."CT_ID")
>>> 40 - access("U"."RID"="ROL"."RID")
>>>
>>> Remote SQL Information (identified by operation id):
>>> ----------------------------------------------------
>>> 17 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>> )
>>>
>>> 42 - SELECT "NM_TXT","F_NM","L_NM","EM","A_LCK" FROM
>>> "USER1"."REMOTE_TAB" "CT" WHERE :1=UPPER("NM_TXT") (accessing 'XXXXX.DBLNK'
>>> )
>>>
>>>


--
http://www.freelists.org/webpage/oracle-l
Received on Tue Dec 01 2020 - 09:54:40 CET

Original text of this message