Re: Optimizer estimated cardinality very low

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Wed, 7 Oct 2020 09:44:54 +0100
Message-ID: <CAGtsp8nTAcbAW4W2b3pqCQmjgaSWi912HH-z_xZ3=fixcbUJ6g_at_mail.gmail.com>



Two observations on infrastructure that might help.

  1. It looks as if your ordr_rqst_date is date only, so ordr_rqst_date <= 312t Julu 2020 is going to give the same result as ordr_rqst_date < 1st Aug 2020. If this is correct then you should change the predicate to use the "strictly less than" approach. This would mean that the optimizer could recognize that you are requesting exactly the whole of the partition and it would drop the date predicate from the execution plan. As it is it recognises you want data from within the "values less than 1st Aug" partition, but has to check that it doesn't pick up rows like 31st July 00:01. Apart from saving CPU this will probably improve the arithmetic as it should become partition-driven rather than table-driven.
  2. You have a column group on order_detail("ORDR_RQST_LINE","ORDR_RQST_DATE","ORDR_RQST_NUMBER") but this won't help you when you have a join on just ("ORDR_RQST_DATE","ORDR_RQST_NUMBER") so the optimiser's join arithmetic is based on the wrong assumption. Add a column group for the two-column combination and gather stats for it. (I'm assuming you don't have indexes on the two combinations where you've reported column groups, and that these column groups have been created to give Oracle the statistics that would have existed as distinct_keys on the indexes).

Regards
Jonathan Lewis

On Tue, 6 Oct 2020 at 23:52, Daniel Coello <coello.daniel_at_gmail.com> wrote:

> Hello,
> I am running into a performance issue where queries using EXISTS condition
> (throug a vpd policy). The resulting estimated cardinality is a very low
> number ( 1 in the example).
>
> | 3 | PX PARTITION HASH ALL | | 1 | 52
> | | 15194 (13)| 00:00:01 | 1 | 8 | Q1,00 | PCWC
> | |
> |* 4 | HASH JOIN SEMI | | 1 | 52
> | 500M| 15194 (13)| 00:00:01 | | | Q1,00 | PCWP
> | |
>
> *Execution plan details below*
>
>
> - How is the join cardinality estimation defined?
>
> - What can I do to get a more accurate estimate?
>
> Details
>
> - Database version is 12.1.0.2 Enterprise Edition on an Exadata
> environment, 4 nodes RAC.
> - The tables are partitioned by ORDR_RQST_DATE by monthly interval and
> subpartitioned by hash in ORDR_RQST_NUMBER. We have seen this issue mostly
> in latest months.
> - Every combination of ORDR_RQST_DATE,ORDR_RQST_NUMBER in the
> ORDER_REQST exists in ORDER_DETL (1-to-many)
> - Statistics are up to date, histograms are the same between months
> where the cardinality estimate is fine. Statistics have been gathered at
> subpartition, partition, global levels and for all.
> - I have tested with different dynamic sampling levels, the single
> table cardinality estimations change but the hash join semi estimated
> cardinality remains 1.
> - The EXISTS clause is created by the VPD policy which I can't modify
> the logic at this time. The issue is still present with simple queries with
> EXISTS clause and no VPD, like the example posted here.
> - I have tried using the CARDINALITY in the subquery " AND EXISTS (
> SELECT /*+ CARDINALITY(ORDER_DETL 30000000) */ 1 ..." the estimate
> cardinality changes for the table but has no effect to the EXISTS estimated
> cardinality.
> - SQL profiles and hints (depending on queries) can help on one by one
> cases but having users doing adhoc queries this approach is not manageable .
>
>
> *Table count info*
>
> ORDER_REQST: 2810794270
> ORDER_DETL: 4243601381
>
> *July/2020 partition count:*
>
> ORDER_REQST: 26895392
> ORDER_DETL: 32314200
>
> Column stats info:
>
> *TABLE_NAME COLUMN_NAME NUM_DISTINCT*
> ORDER_REQST ORDR_RQST_DATE 2696
> ORDER_REQST ORDR_RQST_NUMBER 2719713280
> ORDER_DETL ORDR_RQST_DATE 2696
> ORDER_DETL ORDR_RQST_NUMBER 2719662080
>
>
> Column group stats:
>
> *TABLE_NAME COL_GROUP
> NUM_DISTINCT HISTOGRAM*
> ORDER_REQST ("ORDR_RQST_NUMBER","ORDR_RQST_DATE")
> 2717362176 NONE
> ORDER_DETL ("ORDR_RQST_LINE","ORDR_RQST_DATE","ORDR_RQST_NUMBER")
> 4243601381 NONE
>
>
> Query as it is does not represent an issue alone but when joins are added
> to other tables they are joined using Nested Loops and they practically
> don't finish.
> We can create sql profiles but in an adhoc environment there are thousands
> of versions when joins to other tables are added.
>
>
> Below query plan and extracts from 100053 traces I have collected:
>
> Query:
>
> Actual count: 26895392
> SELECT
> "ORDER_REQST"."ORDR_RQST_NUMBER" "ORDR_RQST_NUMBER",
> "ORDER_REQST"."ORDR_RQST_DATE" "ORDR_RQST_DATE",
> "ORDER_REQST"."ORDR_STATUS" "ORDR_STATUS"
> FROM "ORDER_REQST" "ORDER_REQST"
> WHERE "ORDER_REQST"."ORDR_RQST_DATE" >=
> to_date('01JUL2020','DDMONYYYY')
> AND "ORDER_REQST"."ORDR_RQST_DATE" <= to_date('31JUL2020','DDMONYYYY')
> AND EXISTS (
> SELECT 1
> FROM "ORDER_DETL" "ORDER_DETL"
> WHERE "ORDER_DETL"."ORDR_RQST_NUMBER" =
> "ORDER_REQST"."ORDR_RQST_NUMBER"
> AND "ORDER_DETL"."ORDR_RQST_DATE" =
> "ORDER_REQST"."ORDR_RQST_DATE"
> );
> Execution Plan
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes
> |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ
> Distrib |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | 52
> | | 15194 (13)| 00:00:01 | | | |
> | |
> | 1 | PX COORDINATOR | | |
> | | | | | | |
> | |
> | 2 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 52
> | | 15194 (13)| 00:00:01 | | | Q1,00 | P->S | QC
> (RAND) |
> | 3 | PX PARTITION HASH ALL | | *1* | 52
> | | 15194 (13)| 00:00:01 | 1 | 8 | Q1,00 | PCWC
> | |
> |* 4 | HASH JOIN SEMI | | *1* | 52
> | 500M| 15194 (13)| 00:00:01 | | | Q1,00 | PCWP
> | |
> | 5 | PX PARTITION RANGE SINGLE | | 26M|
> 692M| | 3376 (43)| 00:00:01 | 248 | 248 | Q1,00 | PCWC
> | |
> |* 6 | TABLE ACCESS STORAGE FULL| ORDER_REQST | 26M|
> 692M| | 3376 (43)| 00:00:01 | 1977 | 1984 | Q1,00 | PCWP
> | |
> | 7 | PX PARTITION RANGE SINGLE | | 31M|
> 751M| | 1937 (10)| 00:00:01 | 248 | 248 | Q1,00 | PCWC
> | |
> |* 8 | INDEX FAST FULL SCAN | ORDER_DETL_PK | 31M|
> 751M| | 1937 (10)| 00:00:01 | 1977 | 1984 | Q1,00 | PCWP
> | |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 4 -
> access("ORDER_DETL"."ORDR_RQST_NUMBER"="ORDER_REQST"."ORDR_RQST_NUMBER" AND
> "ORDER_DETL"."ORDR_RQST_DATE"="ORDER_REQST"."ORDR_RQST_DATE")
> 6 - storage("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
> "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
> filter("ORDER_REQST"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
> "ORDER_REQST"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
> 8 - storage("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
> "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
> filter("ORDER_DETL"."ORDR_RQST_DATE">=TO_DATE(' 2020-07-01
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
> "ORDER_DETL"."ORDR_RQST_DATE"<=TO_DATE(' 2020-07-31
> 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
>
> Note
> -----
> - dynamic statistics used: dynamic sampling (level=9)
> - automatic DOP: Computed Degree of Parallelism is 2
> - parallel scans affinitized for buffer cache
>
> When getting the 10053 for the semi join calculation with extended stats I
> see it is using the extended stats created on both columns for each table :
> ...
> Join ColGroups for ORDER_REQST[ORDER_REQST] and ORDER_DETL[ORDER_DETL]
> : (#5, #3) -- Column groups listed above
> ...
> Join selectivity using 1 ColGroups: 6.0249e-08 (sel1 = 0.910080, sel2
> = 0.000000)
> Semi Join Card: 1.000000 = outer (16597656.000000) * sel (6.0249e-08)
> ...
> When running using "_optimizer_enable_extended_stats"=FALSE to avoid
> using extended stats for the same interval then:
> ...
> Revised join sel: 3.0946e-08 = 1.000000 * (1/32314200.00) *
> (1/1.000000)
> Capping Semi Join Card of HJ/SMJ by inner = 0.473558
> Capping Semi Join Card of HJ/SMJ (no post filters) by inner = 0.473558
> Capping Semi Join Card of NL by inner = 0.473558
> Capping Semi Join Card of without POSQ by inner = 0.473558
> Capping Semi Join Card of non adjusted NSQ by inner = 0.473558
> ...
>
> From a 10053 trace of a month interval, june/2020, where estimate
> cardinality is good:
>
> ...
> Join ColGroups for ORDER_REQST[ORDER_REQST] and ORDER_DETL[ORDER_DETL]
> : Using cdn sanity check
> ...
> ColGroup cardinality sanity check: ndv for ORDER_REQST[ORDER_REQST] =
> 17506471.000000 ORDER_DETL[ORDER_DETL] = 21814573.000000
> Join selectivity using 1 ColGroups: 1.000000 (sel1 = 0.000000, sel2 =
> 1.9154e-08)
> Semi Join Card: 14482345.790323 = outer (14482345.790323) * sel
> (1.000000)
> Join Card - Rounded: 14482346 Computed: 14482345.790323
> ...
>
> Thank you in advance for your help
>
> --
> Daniel Coello Villacis
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 07 2020 - 10:44:54 CEST

Original text of this message