Re: hmmm....

From: Noons <wizofoz2k_at_gmail.com>
Date: Tue, 1 Mar 2011 19:05:07 -0800 (PST)
Message-ID: <e485cded-9715-4f31-9512-c8879cc673e2_at_r19g2000prm.googlegroups.com>



On Mar 2, 4:23 am, joel garry <joel-ga..._at_home.com> wrote:

> Let's say you are Bugs Bunny, sitting in front of an assembly line of
> mortar shells.  Every third one you hit with a hammer to be sure it is
> a dud.  You don't know how many will come down the assembly line, but
> you are sampling 33% (for a while, anyway).

I'd hate to be Bugsy if estimate_percent is set to 0.000001%... ;)

According to the doco it can be done, although: do you know anyone who has?
Oh hang on, someone will now butt in claiming it is essential and we're "bad dbas" for not sampling at that value... :)

Ever wondered why "dba" is an anagram of "bad"? <g,d&r>

> No argument from me there.  Like any utility, I expect it all to just
> work, I shouldn't need advanced training in reading traces just to fix
> random problems with someone breathing down my neck.

Exactly. The problem is not a generalized one. It's usually the one in 90000 SQL statement that makes the payroll calc take 7 days instead of 5 minutes, or the financial report for upstairs take 3 hours instead of a few seconds.
Hardly the time to start polishing off 10053s: bosses don't grok "traces"...

> Yes, erratic is the issue.  But it does work most of the time, and
> tools exist to tell it what to do.  There are a couple of different
> failure modes, you'd have to be advocating RBO to address the problem
> of app developers not being good as good as CBO at figuring out what
> the best plan is.  The erratic behavior in production is mostly
> boundary conditions, CBO suddenly deciding to use an inappropriate
> plan.  We only notice the fails there, not the vast majority of
> successes.  (Of course I'm ignoring bind peeking and many other kewl
> things that are silly in retrospect - but known soon enough).
>
> So there is a legitimate purpose to the kewl features of CBO, and that
> is to address the changing skew and distribution of data.  The only
> thing that is really missing is a big red button to say "lock it down
> like so."  There's too many little knobs and switches to do that, but
> you can.  And you can't really ignore all the various ways a general
> purpose rdbms can be used and abused.  Maybe it's too much to expect
> app vendors to tweak the knobs - no, make that definitely, they have
> no idea how the data for any customer is going to spread, and I'm sure
> most of us are no stranger to silliness in vendor requirements.
> That's the customer dba's responsibility.  Bespoke developers should
> be expected to be able to do that, but we all know that's too much to
> ask.

Absolutely. I said it here before, not long ago: the performance problems nowadays - and been like that for a long time - are not "generalized". I can't remember the last time I saw a site with performance across the board at the atrocious level, other than the odd dinossaur who persists in running critical systems with hardware and software that were old last millenium...

That is due in no small part to the CBO and its new features, but also to long running campaigns to convince developers to start using bind variables and such.

What we have now is the odd SQL that takes off on a tangent. In a sense, that's perceived as worse by the "don't grok" brigade: they don't like surprises and they are now used to a certain good level of performance. When a single event happens of bad performance, it stands out like a sore thumb.

What rattles me is that, sort of associating a profile with every statement or fudging and locking down stats, we have no mechanism to hold things quiet and avoid these blowouts.

Big red button indeed: that is sorely needed!

But we also need a way of ensuring that once tested in acceptance, a new release won't go into production and blow out in unexpected places. I've got it clamped down now to the odd SQL in the odd month or so. But I still get differences in execution, even with everything the same including sysstats and undoco params. Someone needs to find a way of allowing us to do reliable and consistent releases to production where not one SQL blows out in execution. I think consistent and reliable stats are a very big and essential part of that. But getting them is not easy when the mechanism to do it decides to pull the rug, because it can...

> Good example, but I'm not convinced that the general trend is not
> towards "better."  The only real problem I see is Oracle not
> addressing the range of desire, some production environments value
> stability, predictability and quality more than others.

Indeed. I've been on both sides of the fence. Sites that do extensive development are usually more open to the odd blowout: they understand the complexities involved and get involved in the fixing. Sites that do little or sporadic development are at the opposite end: they generaly can't afford extended test/release cycles and want everything to go in first time and work first go. Not easy at all, even with 11g.

> We had a power outage Sunday night, the power-protected server had to
> be bounced twice in order to fix the NFS mounts that got screwed up
> (there's probably some way to get around that, but it's so rare the
> sysadmins haven't figured it out yet) which bothered Oracle not at
> all, but a SS database on another server was corrupted, which of
> course was something visible corporate wide the next morning (active
> directory or something).

Aye! Yeah, I know exactly that feeling... Very good points, jgarr. Thanks for the feedback. Received on Tue Mar 01 2011 - 21:05:07 CST

Original text of this message