Re: How to calibrate the CBO

From: Rick Denoire <100.17706_at_germanynet.de>
Date: Mon, 09 Feb 2004 00:27:56 +0100
Message-ID: <95ed2053a3ada7g79n7m3lg14npej6omhm@4ax.com>

"Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:

>On the plus side, if you have been told of specific jobs,
>you could enable 10046 level 8 on those jobs. and see
>where most the time is going. There is a bit of a bug
>in 9.2, though, which means that sql_trace can really
>wreck performnace, so an initiall step might be to
>do a snapshot of the session's wait times (through v$session_event)
>and resource usage (through v$sesstat or v$mystat) before
>you do anything more expensive. These may give you a
>couple of cheap clues.

Yes, I was aware of this bug, I think it didn't hit me here.

>
>> The developers see this as a DB Admin issue, they did not change the
>> code and it ran well prior to the migration after all.
>>
>
>Of course, there is the change to the 20M lines or so
>of Oracle code - but that doesn't really count as a
>code change ;)

Well, people are so candid to assume that changes *must* be positive.

>Look at the Row Source Operation outputs from tkprof,
>and compare these with the theoretical plans from explain
>plan - then check where the ROWS in the row source
>operation diverges wildly from the CARDINALITY in
>explain plan. In theory, the ROWS is the number of
>rows that came out of a row source, and the CARDINALITY
>is the predicted number of rows.

I have examined quite a few of tkprof output files coping with this issue. I must admit that I still miss your point. The way from the numbers reported by tkprof to the solution is a little bit too long for me to comprehend.

>One of the problems of upgrading is that Oracle fixes bugs in
>the optimizer so that costs change. I have a query that has
>a cost of:
> 34 in 8.1.7.4
> 34,034 in 9.2.0.4
> 238 in 10.1.0.1
>
>The first is obviously wrong, the second looks as if it
>is arithmetically correct for the way the plan is reported,
>and the third is correct, given what actually happens.

I won't dare to tell my coleagues: "Hey, your code was running well in 8i just because it was written so bad, now you can't take any advantage of the bugs". They will demand a proof.

>There are options for adjusting some fairly significant assumptions
>in the CBO's strategy - check www.evdbt.com for Tim Gorman's
>article on optimizer_index_cost_adj (and optimizer_index_caching).
>I think Richard Foote would agree that it is vastly better than Don
>Burleson's article. (The title is something like "The search for
>Intelligent life in the CBO")

I played a bit with these parameters. They seemed to change cost calculations somewhat. But I would need to actually run the query in order to know.

>Bear in mind that Oracle 9 uses 'system statistics', and I think
>that the default behaviour in 9.2.0.4 is to invent some values
>if none have been gathered. Check dbms_stats, in particular
>the procedure gather_system_stats. This gives the CBO some
>figures about machine resource costs and timings, and allows
>it to generate more appropriate execution paths for the current
>working environment.

I was faster than your advice. I have been working hard on this issue and came to the following:

optimizer_max_permutations changed from 80000 (8i) to 2000 (9i). In view of the number of tables involved and the complexity of the query, a much larger number of combinations is possible and the CBO perhaps hit the top of 2000 before considering the optimal one.
I read some papers about the new features of the CBO in 9i and realized that system statistics were missing. They are being gathered now. Surprisingly, the execution times reported in the tkprof output are enormous since then; they seem to be reported in microseconds now (why?).

By the way, according to my modest knowledge, if system statistics are missing, the CBO does not invent substitute values but falls back to the 8i formula (which is a very simplistic one).

3) I considered gathering histograms but found some warnings that doing so could adversely affect performance. I will have to consolidate my knowledge on this point yet. At least in 9i, using method_opt=>'for all columns skewonly' should allow Oracle to take the decision by itself.

4) I changed the dynamic_sampling system setting to a value of 3, just in case. As a result, the queries suddenly could not be *parsed* any more (namely the analyzing statement, as shown by tkprof). Now, every time the query is tarted, tkprof shows some statistics gathering activity as a *first* step. This strikes me a bit, since all relevant objects are actually analyzed. I takes only a couple of seconds to execute, anyway. I had to flush the shared_pool to solve the parse problem.

5) I changed the session parameter cursor_sharing to "similar" (was "force"). The CBO should be allowed some freedom to reconsider a new execution plan when something but the SQL code itself changes.

Unfortunately, the query (which is rather a stored procedure composed of several complex statements) can't be run completely to the end now. I get the error "resource required with nowait" or something, and I suspect that some OEM session from an office PC keeps one table locked, while I am sitting at home now. But a small portion of the procedure that was actually executed runs much faster now...

6) I have set "use_stored_outline=false" for the time being, otherwise all other changes would be ignored.

7) The most valuable piece of information seems to be that I got hit by some kind of misbehavior of Oracle 9i related to the use of the parameters pga_aggregate_target and workarea_size_policy=auto. According to other reports about a similar situation when migrating from 8i to 9i, if disabling those parameters and setting sort_area_size manually, the problem disappears.

Unfortunately, this can't be done without recycling the instance, and I will have to announce and prepare this step (the DB is used round the clock).

8) One more thought of mine: If nothing helps, setting compatible='8.1.7' could help me out until the situation clarifies.

9) If time permits, I will give the "Outline Manager" delivered with the OEM a chance. One can edit the execution plan by changing the access paths and join type!

Sorry for the lengthy reply.

Thanks a lot
Rick Denoire Received on Sun Feb 08 2004 - 17:27:56 CST