DBMS_STATS gives wrong stats for some columns/indexes
Date: Sun, 17 May 2009 17:49:59 -0700 (PDT)
I noticed that DBMS_STATS gives wrong statistics for some columns/ indexes - it looks like they forgot the multiplier to compensate for sample size.
Example: a table has approx 3.8 million rows. This is what I get for a
5% sample size - 80,000 distinct values
10% sample size -149,000 distinct values 20% sample size - 286,000 distinct values 50% sample size - 704,000 distinct values100% sample size - 1,357,000 distinct values
DBMS_STATS.GATHER_TABLE_STATS uses SELECT ... SAMPLE, it is easy to see by enabling tracing before running DBMS_STATS.
Result of SELECT ... SAMPLE needs to be multiplied by 1/(sample size)
the actual value. For example, if a table has 1,000,000 rows and you run
SELECT COUNT(*) SAMPLE 5% then it will return 50,000. You need to multiply by 1/(0.05) = 20 to get corret result.
So it seems to me that in some cases DBMS_STATS "forgets" to multiply by 1/(sample size).
If we do that then result will be much close: 5% sample size - 80,000x 20 = 1,600,000 distinct values
10% sample size -149,000 x 10 = 1,490,000 distinct values 20% sample size - 286,000 x 5 = 1,430,000 distinct values 50% sample size - 704,000 x 2 = 1,400,00 distinct values100% sample size - 1,357,000 distinct values
This was observed in Oracle 22.214.171.124 and 126.96.36.199 (Sun, AIX) Received on Sun May 17 2009 - 19:49:59 CDT