DBMS_STATS gives wrong stats for some columns/indexes

From: <ca111026_at_gmail.com>
Date: Sun, 17 May 2009 17:49:59 -0700 (PDT)
Message-ID: <63910202-46fd-4ccb-8a1a-c7377e55db76_at_b6g2000pre.googlegroups.com>



Hi all,

I noticed that DBMS_STATS gives wrong statistics for some columns/ indexes - it looks like they forgot the multiplier to compensate for sample size.

Example: a table has approx 3.8 million rows. This is what I get for a certain column:
5% sample size - 80,000 distinct values

10% sample size -149,000 distinct values
20% sample size - 286,000 distinct values
50% sample size - 704,000 distinct values
100% sample size - 1,357,000 distinct values

DBMS_STATS.GATHER_TABLE_STATS uses SELECT ... SAMPLE, it is easy to see by enabling tracing before running DBMS_STATS.

Result of SELECT ... SAMPLE needs to be multiplied by 1/(sample size) to get
the actual value. For example, if a table has 1,000,000 rows and you run
SELECT COUNT(*) SAMPLE 5% then it will return 50,000. You need to multiply by 1/(0.05) = 20 to get corret result.

So it seems to me that in some cases DBMS_STATS "forgets" to multiply by 1/(sample size).

If we do that then result will be much close: 5% sample size - 80,000x 20 = 1,600,000 distinct values

10% sample size -149,000 x 10 = 1,490,000 distinct values
20% sample size - 286,000 x 5 = 1,430,000 distinct values
50% sample size - 704,000 x 2 = 1,400,00 distinct values
100% sample size - 1,357,000 distinct values

This was observed in Oracle 9.2.0.7 and 9.2.0.8 (Sun, AIX) Received on Sun May 17 2009 - 19:49:59 CDT

Original text of this message