Re: Incorrect cardinality estimate

From: Wolfgang Breitling <breitliw_at_centrexcc.com>
Date: Wed, 24 Jan 2007 07:11:11 -0700
Message-Id: <20070124141009.386365B407C@turing.freelists.org>

I must have used reply instead of reply-all. Since Paul refers to my suggestion (which only he got), here it is

>Date: Tue, 23 Jan 2007 11:23:24 -0700
>To: paul.baumgartel_at_credit-suisse.com
>From: Wolfgang Breitling <breitliw_at_centrexcc.com>
>Subject: Re: Incorrect cardinality estimate
>
>Paul,
>
>the problem is that 12/27/2006 is not a popular value so the
>cardinality estimate didn't change much. Height balanced histograms
>are only good for predicates on popular values (I know, an
>oversimplification, but not by much). For non-popular values the
>cardinality of an equality predicate is num_rows * density =
>194447369 * 6.2921e-03 = 1223482.29.
>
>Ergo, in order to get the cardinality estimate down you have to
>change either num_rows or density. I suggest you whack density to
>2.5e-6 or even less. Because you did gather the histogram this won't
>affect any sql which go after popular values. It will affect sql
>where you have a join on the ods_process_date column.
>
>At 09:41 AM 1/23/2007, you wrote:
>>Following up on my post from Friday regarding an optimizer plan to
>>use an index join rather than a simple index lookup.
>>
>>Wolfgang Breitling pointed out that the optimizer estimated that
>>the query would return 1.3M rows. In fact the query returns 492
>>rows, so I took at look at data value distribution in the predicate
>>column (query is select trans_id from ods_execution where
>>ods_process_date='12/27/2006'). Distribution of ods_process_date
>>is definitely skewed (here are the last few rows of the count of each value):
>>
>>ODS_PROCESS_DATE COUNT(*)
>>------------------- -------
>>/16/2006 00:00:00 1544886
>>11/16/2006 08:53:52 1
>>11/17/2006 00:00:00 1226408
>>11/17/2006 15:50:12 1
>>11/17/2006 16:45:10 1
>>11/17/2006 16:46:00 1
>>11/17/2006 16:46:10 1
>>11/20/2006 00:00:00 12
>>12/27/2006 00:00:00 492
>>
>>So, I have created a histogram on the ods_process_date column, via
>>
>>exec dbms_stats.gather_table_stats('ODS','ODS_EXECUTION',-
>> > method_opt=>'FOR COLUMNS ODS_PROCESS_DATE SIZE 254',-
>> > stattab=>'ODS_STATS',statown=>'P_BAUMGA2',statid=>'NoHisto')
>>
>>Now the puzzle: after flushing the shared pool (to force a
>>re-parse), the optimizer's cardinality estimate, and thus its plan,
>>is not changing. A 10053 trace shows that the optimizer is aware
>>of the histogram:
>>
>>SINGLE TABLE ACCESS PATH
>>Column: ODS_PROCES Col#: 14 Table: ODS_EXECUTION Alias: ODS_EXECUTION
>> NDV: 772 NULLS: 0 DENS: 6.2921e-03
>> HEIGHT BALANCED HISTOGRAM: #BKT: 254 #VAL: 143
>> TABLE: ODS_EXECUTION ORIG CDN: 194447369 ROUNDED CDN:
>> 1223485 CMPTD CDN: 1223485
>>
>>
>>I am not sure what should be the next step--any suggestions welcome. Thanks!

Regards

Wolfgang Breitling
Centrex Consulting Corporation
www.centrexcc.com

This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Jan 24 2007 - 08:11:11 CST