Re: DBMS_STATS gives wrong stats for some columns/indexes

From: ErikYkema <erik.ykema_at_gmail.com>
Date: Sun, 17 May 2009 22:29:37 -0700 (PDT)
Message-ID: <47ce4a8c-b58b-4262-a0eb-d402038a5459_at_r13g2000vbr.googlegroups.com>



On 18 mei, 02:49, ca111..._at_gmail.com wrote:
> Hi all,
>
> I noticed that DBMS_STATS gives wrong statistics for some columns/
> indexes - it looks like they forgot the multiplier to compensate for
> sample size.
>
> Example: a table has approx 3.8 million rows. This is what I get for a
> certain column:
> 5% sample size - 80,000 distinct values
> 10% sample size -149,000 distinct values
> 20% sample size - 286,000 distinct values
> 50% sample size - 704,000 distinct values
> 100% sample size - 1,357,000 distinct values
>
> DBMS_STATS.GATHER_TABLE_STATS uses SELECT ... SAMPLE, it is easy to
> see by enabling tracing before running DBMS_STATS.
>
> Result of SELECT ... SAMPLE needs to be multiplied by 1/(sample size)
> to get
> the actual value. For example, if a table has 1,000,000 rows and you
> run
> SELECT COUNT(*) SAMPLE 5% then it will return 50,000.
> You need to multiply by 1/(0.05) = 20 to get corret result.
>
> So it seems to me that in some cases DBMS_STATS "forgets" to
> multiply by 1/(sample size).
>
> If we do that then result will be much close:
> 5% sample size - 80,000x 20 = 1,600,000 distinct values
> 10% sample size -149,000 x 10 = 1,490,000 distinct values
> 20% sample size - 286,000 x 5 = 1,430,000 distinct values
> 50% sample size - 704,000 x 2 = 1,400,00 distinct values
> 100% sample size - 1,357,000 distinct values
>
> This was observed in Oracle 9.2.0.7 and 9.2.0.8 (Sun, AIX)

Hi,
Can you please post a verbatim sample of a dbms_stats command that you run?
And is there anything you can tell about the structure of the data in the table? Can there be any skew?
Regards, Erik Received on Mon May 18 2009 - 00:29:37 CDT

Original text of this message