RE: When stats trash your performance (third school of thought)

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 7 Dec 2005 15:43:52 -0500
Message-ID: <KNEIIDHFLNJDHOOCFCDKEEINHFAA.mwf@rsiz.com>

Third school of thought:

Premises:

The stability of your system is a priority.
Your system's stable operation justifies significant infrastructure cost to protect.

Then, one way to achieve that is to create a repository pairing all significant queries parsed and the resulting plan. This is populated dynamically and kept up to date.
You also build a clone of production that can be periodically refreshed as a physical standby so that storage details are identical. If you're up to a release where system statistics are kept those have to be supplied as well.

Now, instead of gathering statistics or whatever the name of that process is on your release version, you gather on the clone. Then you drive a parse of every single query in your repository and spit out the ones that have different plans from current production. If all the differences are good, then you take the stats from the clone and slap them in under your change management procedures onto production, knowing that you've now got stability for your current/previous queries and the best chance for the CBO to do a good job on never before seen queries. If there is no change at all, you still slap the new stats in to give the CBO the best chance to do a good job on never before seen queries. The tricky part is when some of the changed plans are bad enough to cause a significant problem in production. (You don't really care if a query run once a day that takes 10 seconds becomes 20 minutes. You probably care if something you can't avoid running 10's of thousands of times a day degrades by a few percent.)

Then you either need to prepare to hint or outline the plans you need to stabilize before you slap in the new stats, or figure out better histograms or percents to make the significant bad plans go away. You do all this work on a recent clone of production (full size or else why bother, and not reorganized or cleaned up in any way unless you're planning to do exactly the same on production, or else, again, why bother?)

Once such a configuration is instantiated and process and change controls are in place, this works really well. You can only judge whether the infrastructure and startup costs are rational in particular situations based on the responsible corporate officer's assessment of the dollar value of the risks, etc., etc., etc.

Regards,

mwf
  -----Original Message-----
  From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Bobak, Mark   Sent: Wednesday, December 07, 2005 2:17 PM   To: stephenbooth.uk_at_gmail.com; oracle-l_at_freelists.org   Subject: RE: When stats trash your performance

Hi Stephen,

I'm not sure I can point you to specific documentation, however, for a stable, running, production system, you want your execution plans to be stable. So, the only "safe" time to update statistics is when updating them will have little or no effect. But if updating them will have little or no effect, then why update them in the first place?

Ultimately, I think there are two schools of thought here: 1.) If the system is stable, don't analyze objects. Leave the stats where they are, and Oracle will continue to supply stable execution plans. This is the "if it ain't broke, don't fix it" school of thought.

2.) Analyze early and often. The CBO is far smarter than I ever will be. The better my stats are, the better off my performance will be. This is the "I have faith in the church of Oracle. Oracle is the one true light, and the way." school of thought.

The reality is probably somewhere in between.

In my shop, we are fortunate enough to have a full preproduction environment which is equal in size to production. Every 3-6 months, we clone from prod into preprod, so we have fresh data. Then, in preprod, along with a software release, we analyze objects, and do our performance testing in preprod w/ the fresh stats. When we roll our release into prod, we export stats from preprod and import to prod. In this way, we have some controlled method of moving (relatively) fresh stats into prod on a semi-regular basis.

I know, this doesn't completely answer your question....just some thoughts on the subject from my perspective. I hope it's helpful.

Finally, I gotta ask: Why the heck are you bouncing for nightly backups on a production server? Hot backups are a good thing! ;-)

-Mark

--

  From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of stephen booth
  Sent: Wednesday, December 07, 2005 1:59 PM   To: oracle-l_at_freelists.org
  Subject: When stats trash your performance

I've heard mention in a number of talks that you shouldn't gather stats very often, even not at all once a system is bedded in, as it can cause your database to start performing really badly. I don't recall any one ever saying, or ever reading, an explanation as to why this might happen. Perhaps I'm looking at it from the wrong perspective but it seems axiomic that the fresher your stats the better decisions CBO is likely to make.

I've Googled and can't really find anything that seems relevant other than some references to gathering stats invalidating existing plans so meaning Oracle has to do a hard parse.

Can anyone point me to any documentation that might explain why this might happen. It's not something I've really had to deal with in the past (getting the system running and occasionally trying to tune the server to reduce/eliminate waits before it goes live has been most of what I've been involved in, we don't write code in house so SQL tuning hasn't really featured). Something with pointers as to how to resolve such problems would be very useful as well.

What has triggered this question is I've just been hit by a situation where one of our systems was running OK (performance wasn't great but was acceptable and wasn't noticably worse than when the system was put in) then one of the sysadmins (not DBA) was told by our FM supplier to analyze a half dozen tables and associated indexes that were used by a new module that's due to go live soon and had been loaded with some historical information that was required. He did so and, so he claims, the system started running noticably slower shortly after. He then decided to analyze all the rest of the tables and their indexes. Performance got worse. Four days later they notified me (for political reasons I have no involvement with this system or any other system that has gone live) and basically said "it's running slow, fix it.". I don't know if there were any existing stats which got over written (I've asked, still waiting for a response), if there weren't I'm considering just deleting all the stats and seeing if that helps.

This system is shutdown every night for backup so the plans get wiped out anyway.

Stephen

--
It's better to ask a silly question than to make a silly assumption.

--

http://www.freelists.org/webpage/oracle-l Received on Wed Dec 07 2005 - 14:49:15 CST