Re: Strictly Enforcing Query Plan Stability

From: Greg Stark <greg-spare-1_at_mit.edu>
Date: Fri, 13 Oct 2000 18:26:38 GMT
Message-ID: <87wvfctzwx.fsf@HSE-MTL-ppp62507.qc.sympatico.ca>

frank <fbortel_at_home.nl> writes:

> I have never ever seen that happen, or -probably more important- heard about
> such an event.
>
> > I'm very serious. This seems like a major flaw in Oracle that makes it
> > fundamentally unsuitable for mission critical OLTP applications. It means the
> > operations people cannot guarantee that they know what Oracle is doing at all
> > time and cannot guarantee that the application will continue to work as the
> > data changes.
>
> Well, who come there are some many of them around? All of these designers
> and users must have less high standard than you have.

This sounds like a typical scenario where Oracle is being used for purposes that it hasn't in the past. Perhaps you've been working exclusively on "traditional" database applications like data warehouses and environments with very tightly controlled code and very rigorous testing regimen. However the world has changed and Oracle is scrambling to catch up.

> > You make a fundamental assumption that the optimizer is infallible. Have you
> > actually tried to use the damned thing?
>
> No, I'm not. That's why hints were invented. And yes, I have used it.
>
> > It's very helpful for initially finding good query plans in development
> > but it's not appropriate to be using it at run-time.
>
> Why would that be?

So you don't think the plan stability features introduced in 8i are useful for anything? If the optimizer changes behaviour it's infallibly because the new plan is better than the old one? If a new query is introduced and it's checked with one data set then it's undoubtedly going produce the same plan or better on the production system?

Perhaps some concrete examples would be clearer:

I write a web site involving basic queries involving some joins etc. I test it out on a sample database, I test it out on a QA system, including checking the plans and disk i/o etc. It goes into production.

It gradually slows down as more users sign on, that's expected, but it's basically satisfactory. I'm monitoring the response time planning to do some tuning well before the response time becomes too slow. All of the sudden everything grinds to a complete halt, Oracle has decided to start using full table scans because the tables have passed some arbitrary size that the optimizer decided was worth doing full table scans. It turns out Oracle was overeager in switching to full table scans and they were actually slightly slower. Had it stuck with the existing plans at least I could have continued to monitor the response time and react in plenty of time to deal with the performance degradation.

That scenario is handled by the plan stability features in 8i. However the next scenario is what I'm actually complaining about:

Now I have one of the developers implement a new web page. I check the code and test it on the QA system. We establish stored outlines for every query with an acceptable execution plan with acceptable response times. However either due to a fault in my testing procedure or perhaps due to some obscure corner case that wasn't anticipated at all, there's a query in the new code that I don't verify. It turns out that this query does a full table scan on a large table. When this code is placed into production and receives hundreds of hits per second that full table scan completely hammers Oracle saturating the disk i/o and blocking every process doing the same query. The entire application is killed by an execution plan that Oracle devised ad hoc without having been approved.

This is the scenario I would like to guarantee won't occur. The development people can do their job with testing but they can't guarantee that bugs will never occur. But from an operations point of view we should be able to guarantee that a bug causes an immediate error rather than having Oracle just try to do its best to come up with an ad hoc execution plan and possibly kill the entire application.

-- 
greg

Received on Fri Oct 13 2000 - 13:26:38 CDT