Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> Calculating Distinct counts from a sample
How does oracle calculate distinct counts from a sample of data?
I have traced dbms_stats but it doesn't give much of a clue as to how it does this. Initially I thought I could just use a simple calculation based on the percentage of distincts in any sample set compared to the number of rows in the sample set. So if the distinct count for the sample set was 30% of the number of rows in a 10% sample, then I would just scale this up....but I now know how obviously flawed this method is.
The only real way I can see of making this work is to apply some sort of linear regression function to the natural log of a number of additional samples from the sample set - which would allow me to come up with a "reasonable" scaling factor.
Any mathematicians out there who could shed some light on this?
Many thanks,
Stephen Barr.
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Jul 03 2005 - 09:51:45 CDT