Re: A question about huge difference in cardinality of a query with 3 predicates

From: Nirav A Shah <shivam71_at_hotmail.com>
Date: Sat, 28 Oct 2017 15:52:23 +0000
Message-ID: <SYXPR01MB095986DF55CCBD4433659A4EC15B0_at_SYXPR01MB0959.ausprd01.prod.outlook.com>



Hi All,

I am having an issue of a query getting a bad plan -which I have narrowed down to major mismatch between estimated and actual cardinality of one of the tables involved in the query. I need

help from experts on understanding why the cardinality estimate is wrong and what can be done to rectify. For this purpose I have created a test table with required columns and tried to

create same situation (of cardinality mismatch) and following are the details on it. (Oracle version is : 11.2.0.4 on Linux.)

Table structure: CREATE TABLE processes

(

   process_id NUMBER (12) NOT NULL PRIMARY KEY,

   clientid VARCHAR2 (20) NOT NULL,

   evt_type_id NUMBER (12) NOT NULL,

   status VARCHAR2 (20) NOT NULL

)

Following are the stats about the table (Thanks to the print_table utility of Tom Kyte Sir - the below is output from "select * from all_tables where table_name='PROCESSES'" from

which several default stuff is remove to keep this a little brief)

OWNER                         : TEST

TABLE_NAME                    : PROCESSES

PCT_FREE                      : 10

NUM_ROWS                      : 2231714

BLOCKS                        : 12137

AVG_ROW_LEN                   : 31

SAMPLE_SIZE                   : 2231714



Here is the query and test data :

variable v_coid varchar2(16);

variable v_status varchar2(16);

variable v_evt_type_id number;

exec :v_coid := 'G31MQQNM99ABCA9V';

exec :v_status := 'INP';

exec :v_event_type_id :=5014;

SQL> SELECT COUNT (*) FROM processes wpi WHERE WPI.CLIENTID = :v_coid AND wpi.status = :v_status AND WPI.evt_type_id = :v_event_type_id;

  COUNT(*)


     10404

1 row selected.

Following is the Execution plsn:

Plan hash value: 3304333532


| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |


| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08 | 11406 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 11406 | |* 2 | TABLE ACCESS FULL| PROCESSES | 1 | 62 | 10404 |00:00:00.08 | 11406 |


Peeked Binds (identified by position):


   1 - (VARCHAR2(30), CSID=873): 'G31MQQNM99ABCA9V'    2 - (VARCHAR2(30), CSID=873): 'INP' PLAN_TABLE_OUTPUT


   3 - (NUMBER): 3004 Predicate Information (identified by operation id):


   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."STATUS"=:V_STATUS AND

              "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID)) 29 rows selected.

Here we can see the Huge difference - the estimated cardinality of just 62 versus the acutal cardinality of 10404. Note that all these 3 columns involved in the query have frequency histograms

on them and following are details on it...and this is the issue - the difference between estimated cardinality of 62 versus actual value of 10404 that I need help to sort out.

exec print_table('select * from all_tab_col_statistics where owner=''TEST'' AND TABLE_NAME=''PROCESSES''');

OWNER                         : TEST -- same for all the columns below

TABLE_NAME                    : PROCESSES -- same for all the columns below

COLUMN_NAME                   : PROCESS_ID

NUM_DISTINCT                  : 2231714

LOW_VALUE                     : C50D01015C04

HIGH_VALUE                    : C5170B1A5218

DENSITY                       : .000000444209506971868

NUM_NULLS                     : 0

NUM_BUCKETS                   : 254

LAST_ANALYZED                 : 20-oct-2017 04:20:20 --for all columns below too...the stats are current.

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 7

HISTOGRAM                     : HEIGHT BALANCED


-----------------
OWNER : TEST TABLE_NAME : PROCESSES COLUMN_NAME : CLIENTID NUM_DISTINCT : 104 LOW_VALUE : 30324652425132433157563030303936 HIGH_VALUE : 47354D384A4D54394545484A51475859 DENSITY : .000000222104676853238 NUM_NULLS : 0 NUM_BUCKETS : 87 SAMPLE_SIZE : 5548 AVG_COL_LEN : 17 HISTOGRAM : FREQUENCY
-----------------
COLUMN_NAME : EVT_TYPE_ID NUM_DISTINCT : 50 LOW_VALUE : C102 HIGH_VALUE : C21F07 DENSITY : .000000222104676853238 NUM_NULLS : 0 NUM_BUCKETS : 38 SAMPLE_SIZE : 5548 AVG_COL_LEN : 4 HISTOGRAM : FREQUENCY
-----------------
COLUMN_NAME : STATUS NUM_DISTINCT : 3 LOW_VALUE : 434F4D HIGH_VALUE : 494E50 DENSITY : .000000222104676853238 NUM_NULLS : 0 NUM_BUCKETS : 3 SAMPLE_SIZE : 5548 AVG_COL_LEN : 4 HISTOGRAM : FREQUENCY
-----------------

Please let me know if I should provide any other details. One thing I found was that if I do dynamic sampling level 8 then the cardinality estimate improved to nearly 7000 (don't recall that exact figure),but I would not like to depend on it. The other thing is if I drop the histograms then the cardinality improved slightly - to 143 as shown in the plan below but that is still way off the actual value which is more than 10K (10404 to be exact).


| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |


| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08 | 11406 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 11406 | |* 2 | TABLE ACCESS FULL| PROCESSES | 1 | 143 | 10404 |00:00:00.08 | 11406 |


Peeked Binds (identified by position):


   1 - (VARCHAR2(30), CSID=873): 'G31MQQNMH3TW7A9V'    2 - (VARCHAR2(30), CSID=873): 'INP' PLAN_TABLE_OUTPUT


   3 - (NUMBER): 3004 Predicate Information (identified by operation id):


   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID

              AND "WPI"."STATUS"=:V_STATUS)) Regards, Nirav

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Oct 28 2017 - 17:52:23 CEST

Original text of this message