Optimizer estimated cardinality very low

From: Daniel Coello <coello.daniel_at_gmail.com>
Date: Tue, 6 Oct 2020 18:51:19 -0400
Message-ID: <CADX5pQUWAcweyzge3PjJaTkErQdn081A=5cDUqZn2k=kTMb61g_at_mail.gmail.com>



 Hello,
I am running into a performance issue where queries using EXISTS condition (throug a vpd policy). The resulting estimated cardinality is a very low number ( 1 in the example).
|   3 |    PX PARTITION HASH ALL       |               |     1 |    52
|       | 15194  (13)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC
|            |
|*  4 |     HASH JOIN SEMI             |               |     1 |    52 |
500M| 15194  (13)| 00:00:01 |       |       |  Q1,00 | PCWP |            |

*Execution plan details below*

  • How is the join cardinality estimation defined?
  • What can I do to get a more accurate estimate?

Details

  • Database version is 12.1.0.2 Enterprise Edition on an Exadata environment, 4 nodes RAC.
  • The tables are partitioned by ORDR_RQST_DATE by monthly interval and subpartitioned by hash in ORDR_RQST_NUMBER. We have seen this issue mostly in latest months.
  • Every combination of ORDR_RQST_DATE,ORDR_RQST_NUMBER in the ORDER_REQST exists in ORDER_DETL (1-to-many)
  • Statistics are up to date, histograms are the same between months where the cardinality estimate is fine. Statistics have been gathered at subpartition, partition, global levels and for all.
  • I have tested with different dynamic sampling levels, the single table cardinality estimations change but the hash join semi estimated cardinality remains 1.
  • The EXISTS clause is created by the VPD policy which I can't modify the logic at this time. The issue is still present with simple queries with EXISTS clause and no VPD, like the example posted here.
  • I have tried using the CARDINALITY in the subquery " AND EXISTS ( SELECT /*+ CARDINALITY(ORDER_DETL 30000000) */ 1 ..." the estimate cardinality changes for the table but has no effect to the EXISTS estimated cardinality.
  • SQL profiles and hints (depending on queries) can help on one by one cases but having users doing adhoc queries this approach is not manageable .

*Table count info*

ORDER_REQST: 2810794270
ORDER_DETL: 4243601381 *July/2020 partition count:*

ORDER_REQST: 26895392
ORDER_DETL: 32314200 Column stats info:

*TABLE_NAME    COLUMN_NAME         NUM_DISTINCT*
ORDER_REQST   ORDR_RQST_DATE      2696
ORDER_REQST   ORDR_RQST_NUMBER    2719713280
ORDER_DETL    ORDR_RQST_DATE      2696

ORDER_DETL ORDR_RQST_NUMBER 2719662080 Column group stats:

*TABLE_NAME COL_GROUP
NUM_DISTINCT HISTOGRAM*
ORDER_REQST ("ORDR_RQST_NUMBER","ORDR_RQST_DATE") 2717362176 NONE
ORDER_DETL ("ORDR_RQST_LINE","ORDR_RQST_DATE","ORDR_RQST_NUMBER")  4243601381 NONE Query as it is does not represent an issue alone but when joins are added to other tables they are joined using Nested Loops and they practically don't finish.
We can create sql profiles but in an adhoc environment there are thousands of versions when joins to other tables are added.

Below query plan and extracts from 100053 traces I have collected:

Query:

Actual count: 26895392

    SELECT

        "ORDER_REQST"."ORDR_RQST_NUMBER"    "ORDR_RQST_NUMBER",
        "ORDER_REQST"."ORDR_RQST_DATE"      "ORDR_RQST_DATE",
        "ORDER_REQST"."ORDR_STATUS"         "ORDR_STATUS"
    FROM "ORDER_REQST" "ORDER_REQST"
    WHERE "ORDER_REQST"."ORDR_RQST_DATE" >= to_date('01JUL2020','DDMONYYYY')

    AND "ORDER_REQST"."ORDR_RQST_DATE" <= to_date('31JUL2020','DDMONYYYY')     AND EXISTS (

            SELECT 1
            FROM "ORDER_DETL" "ORDER_DETL"
            WHERE "ORDER_DETL"."ORDR_RQST_NUMBER" =
"ORDER_REQST"."ORDR_RQST_NUMBER"
                AND "ORDER_DETL"."ORDR_RQST_DATE" =
"ORDER_REQST"."ORDR_RQST_DATE"
    );
    Execution Plan
| Id  | Operation                      | Name          | Rows  | Bytes
|TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ
Distrib |
|   0 | SELECT STATEMENT               |               |     1 |    52
|       | 15194  (13)| 00:00:01 |       |       |        |
|            |
|   1 |  PX COORDINATOR                |               |       |
|       |            |          |       |       |        |
|            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000      |     1 |    52
|       | 15194  (13)| 00:00:01 |       |       |  Q1,00 | P->S | QC
(RAND)  |
|   3 |    PX PARTITION HASH ALL       |               |     *1* |    52
|       | 15194  (13)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC
|            |
|*  4 |     HASH JOIN SEMI             |               |     *1* |    52
|   500M| 15194  (13)| 00:00:01 |       |       |  Q1,00 | PCWP
|            |
|   5 |      PX PARTITION RANGE SINGLE |               |    26M|
692M|       |  3376  (43)| 00:00:01 |   248 |   248 |  Q1,00 | PCWC
|            |
|*  6 |       TABLE ACCESS STORAGE FULL| ORDER_REQST   |    26M|
692M|       |  3376  (43)| 00:00:01 |  1977 |  1984 |  Q1,00 | PCWP
|            |
|   7 |      PX PARTITION RANGE SINGLE |               |    31M|
751M|       |  1937  (10)| 00:00:01 |   248 |   248 |  Q1,00 | PCWC
|            |
|*  8 |       INDEX FAST FULL SCAN     | ORDER_DETL_PK |    31M|
751M|       |  1937  (10)| 00:00:01 |  1977 |  1984 |  Q1,00 | PCWP
|            |
-----------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

   4 -
access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND

              "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")    6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

       filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

              "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

   8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

       filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

              "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note


  • dynamic statistics used: dynamic sampling (level=9)
  • automatic DOP: Computed Degree of Parallelism is 2
  • parallel scans affinitized for buffer cache

When getting the 10053 for the semi join calculation with extended stats I see it is using the extended stats created on both columns for each table :

    ...
    Join ColGroups for ORDER_REQST[ORDER_REQST] and ORDER_DETL[ORDER_DETL] : (#5, #3) -- Column groups listed above

    ...
    Join selectivity using 1 ColGroups: 6.0249e-08 (sel1 = 0.910080, sel2 = 0.000000)

    Semi Join Card: 1.000000 = outer (16597656.000000) * sel (6.0249e-08)     ...
When running using "_optimizer_enable_extended_stats"=FALSE to avoid using extended stats for the same interval then:

    ...
    Revised join sel: 3.0946e-08 = 1.000000 * (1/32314200.00) * (1/1.000000)     Capping Semi Join Card of HJ/SMJ by inner = 0.473558     Capping Semi Join Card of HJ/SMJ (no post filters) by inner = 0.473558     Capping Semi Join Card of NL by inner = 0.473558     Capping Semi Join Card of without POSQ by inner = 0.473558     Capping Semi Join Card of non adjusted NSQ by inner = 0.473558     ...

From a 10053 trace of a month interval, june/2020, where estimate cardinality is good:

    ...
    Join ColGroups for ORDER_REQST[ORDER_REQST] and ORDER_DETL[ORDER_DETL] : Using cdn sanity check

    ...
    ColGroup cardinality sanity check: ndv for ORDER_REQST[ORDER_REQST] = 17506471.000000 ORDER_DETL[ORDER_DETL] = 21814573.000000     Join selectivity using 1 ColGroups: 1.000000 (sel1 = 0.000000, sel2 = 1.9154e-08)

    Semi Join Card: 14482345.790323 = outer (14482345.790323) * sel (1.000000)

    Join Card - Rounded: 14482346 Computed: 14482345.790323     ...

Thank you in advance for your help

-- 
Daniel Coello Villacis

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 07 2020 - 00:51:19 CEST

Original text of this message