Re: hmmm....

From: Noons <wizofoz2k_at_yahoo.com.au>
Date: Tue, 01 Mar 2011 21:33:48 +1100
Message-ID: <ikii1u$gct$1_at_news.eternal-september.org>



dombrooks wrote,on my timestamp of 1/03/2011 8:33 PM:

>
> Surely using any size sample, even auto, means that the actual rows
> sampled (ignoring sample size aspect) are different each time so in
> theory the statistical nature/distribution/etc of the data could be
> different each time, even in the same environment. That's why these
> plan stability features keep evolving.

I think you misread what I said. Likely my fault for not explaining it well. May I suggest you re-read the strange outcome in my previous posts? There are no significant differences. That is the problem.

>> I've got tables with nearly the same number of rows and blocks, some go
>> num_rows=sample_size while others go num_rows=sample_size * .
>
> From http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/stats.htm#PFGRF94721:
>
> "When the ESTIMATE_PERCENT parameter is manually specified, the
> DBMS_STATS gathering procedures may automatically increase the
> sampling percentage if the specified percentage did not produce a
> large enough sample."
> Maybe it can decrease as well? Too clever for its own good?

What's missing from that quotation is the actual end of that paragraph: "This ensures the stability of the estimated values by reducing fluctuations." (and as well the definition of what is a "large enough sample")

I can't for the life of me fathom how introducing a random element can ensure stability of anything, but I'm sure that's just me. Yes indeed: too clever for its own good!

In fact, I still can't understand how can they tell me that I'm sampling 30% - or whatever % - of the rows of a table that I have not counted the rows of: it's simply impossible to determine any percentage of an unknown quantity without first measuring the whole of said quantity! And doing so defeats the whole purpose of doing only a percentage of it in the first place...

Unless of course we're talking histograms as well. But if I didn't ask for them, then just get out of my way and do a 100% scan every time, there is a good boy!

> It reminds me a bit of dynamic sampling in SQL statements where it can
> discard the dynamic sampling results if it doesn't think it's
> representative.
> I can't remember what the entry looks like in a 10053 when it does
> this.

Exactly. The problem here is that without going through extensive profiling or fudging the stats manually or one of the other plan stability options, it is impossible to confidently state what plan the CBO will be considering between, say, acceptance and production! Now multiply that by thousands of tables and indexes and you got the root of an unmanageable problem, grid or no grid, ASH or no ASH, 10053 or no 10053!

This has been one of the losing battles between dbas in the field who have to put up with erratic performance and rdbms developers, always too eager to introduce "kewl" new features that serve no purpose whatsoever in normal production environments...

Let's hope one day they'll realize that performance in Oracle's RDBMS has never been a widespread issue. What is unacceptable is the erratic nature of said performance. And random behavior switches are NOT the way to resolve that shortcoming...

Look at the differences between auto_sample_size between 10g and 11gr2! One favors lower percentages, the other favors higher ones: talk about unpredictable outcome! Received on Tue Mar 01 2011 - 04:33:48 CST

Original text of this message