Re: hmmm....

From: joel garry <joel-garry_at_home.com>
Date: Tue, 1 Mar 2011 09:23:45 -0800 (PST)
Message-ID: <1a8c511b-5d1a-44b9-a5fa-c8c64b3b3858_at_y36g2000pra.googlegroups.com>



On Mar 1, 2:33 am, Noons <wizofo..._at_yahoo.com.au> wrote:
> dombrooks wrote,on my timestamp of 1/03/2011 8:33 PM:
>
>
>
> > Surely using any size sample, even auto, means that the actual rows
> > sampled (ignoring sample size aspect) are different each time so in
> > theory the statistical nature/distribution/etc of the data could be
> > different each time, even in the same environment. That's why these
> > plan stability features keep evolving.
>
> I think you misread what I said.  Likely my fault for not explaining it well.
> May I suggest you re-read the strange outcome in my previous posts? There are no
> significant differences.  That is the problem.
>
> >> I've got tables with nearly the same number of rows and blocks, some go
> >> num_rows=sample_size while others go num_rows=sample_size * .
>
> >  Fromhttp://download.oracle.com/docs/cd/E11882_01/server.112/e16638/stats....
>
> > "When the ESTIMATE_PERCENT parameter is manually specified, the
> > DBMS_STATS gathering procedures may automatically increase the
> > sampling percentage if the specified percentage did not produce a
> > large enough sample."
> > Maybe it can decrease as well? Too clever for its own good?
>
> What's missing from that quotation is the actual end of that paragraph:
> "This ensures the stability of the estimated values by reducing fluctuations."
> (and as well the definition of what is a "large enough sample")
>
> I can't for the life of me fathom how introducing a random element can ensure
> stability of anything, but I'm sure that's just me.
> Yes indeed: too clever for its own good!
>
> In fact, I still can't understand how can they tell me that I'm sampling 30% -
> or whatever % - of the rows of a table that I have not counted the rows of: it's
> simply impossible to determine any percentage of an unknown quantity without
> first measuring the whole of said quantity!  And doing so defeats the whole
> purpose of doing only a percentage of it in the first place...
>

Let's say you are Bugs Bunny, sitting in front of an assembly line of mortar shells. Every third one you hit with a hammer to be sure it is a dud. You don't know how many will come down the assembly line, but you are sampling 33% (for a while, anyway).

> Unless of course we're talking histograms as well.  But if I didn't ask for
> them, then just get out of my way and do a 100% scan every time, there is a good
> boy!
>
> > It reminds me a bit of dynamic sampling in SQL statements where it can
> > discard the dynamic sampling results if it doesn't think it's
> > representative.
> > I can't remember what the entry looks like in a 10053 when it does
> > this.
>
> Exactly.  The problem here is that without going through extensive profiling or
> fudging the stats manually or one of the other plan stability options, it is
> impossible to confidently state what plan the CBO will be considering between,
> say, acceptance and production!  Now multiply that by thousands of tables and
> indexes and you got the root of an unmanageable problem, grid or no grid, ASH or
> no ASH, 10053 or no 10053!

No argument from me there. Like any utility, I expect it all to just work, I shouldn't need advanced training in reading traces just to fix random problems with someone breathing down my neck.

>
> This has been one of the losing battles between dbas in the field who have to
> put up with erratic performance and rdbms developers, always too eager to
> introduce "kewl" new features that serve no purpose whatsoever in normal
> production environments...
>
> Let's hope one day they'll realize that performance in Oracle's RDBMS has never
> been a widespread issue.  What is unacceptable is the erratic nature of said
> performance.  And random behavior switches are NOT the way to resolve that
> shortcoming...

Yes, erratic is the issue. But it does work most of the time, and tools exist to tell it what to do. There are a couple of different failure modes, you'd have to be advocating RBO to address the problem of app developers not being good as good as CBO at figuring out what the best plan is. The erratic behavior in production is mostly boundary conditions, CBO suddenly deciding to use an inappropriate plan. We only notice the fails there, not the vast majority of successes. (Of course I'm ignoring bind peeking and many other kewl things that are silly in retrospect - but known soon enough).

So there is a legitimate purpose to the kewl features of CBO, and that is to address the changing skew and distribution of data. The only thing that is really missing is a big red button to say "lock it down like so." There's too many little knobs and switches to do that, but you can. And you can't really ignore all the various ways a general purpose rdbms can be used and abused. Maybe it's too much to expect app vendors to tweak the knobs - no, make that definitely, they have no idea how the data for any customer is going to spread, and I'm sure most of us are no stranger to silliness in vendor requirements. That's the customer dba's responsibility. Bespoke developers should be expected to be able to do that, but we all know that's too much to ask.

>
> Look at the differences between auto_sample_size between 10g and 11gr2!
> One favors lower percentages, the other favors higher ones: talk about
> unpredictable outcome!

Good example, but I'm not convinced that the general trend is not towards "better." The only real problem I see is Oracle not addressing the range of desire, some production environments value stability, predictability and quality more than others.

We had a power outage Sunday night, the power-protected server had to be bounced twice in order to fix the NFS mounts that got screwed up (there's probably some way to get around that, but it's so rare the sysadmins haven't figured it out yet) which bothered Oracle not at all, but a SS database on another server was corrupted, which of course was something visible corporate wide the next morning (active directory or something).

jg

--
_at_home.com is bogus.
What happens with bad email admin:
http://www.signonsandiego.com/news/2011/feb/25/e-mail-reignites-ucsd-racial-tension/
Received on Tue Mar 01 2011 - 11:23:45 CST

Original text of this message