Re: How to calibrate the CBO

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 9 Feb 2004 09:57:07 +0000 (UTC)
Message-ID: <c07lhj$br7$1@sparta.btinternet.com>
Notes in-line:
-- 
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

  The educated person is not the person
  who can answer the questions, but the
  person who can question the answers -- T. Schick Jr


Next public appearances:
 March 2004 Hotsos Symposium - The Burden of Proof
 March 2004 Charlotte NC OUG - CBO Tutorial
 April 2004 Iceland


One-day tutorials:
http://www.jlcomp.demon.co.uk/tutorial.html


Three-day seminar:
see http://www.jlcomp.demon.co.uk/seminar.html
____UK___February
____UK___June


The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Rick Denoire" <100.17706_at_germanynet.de> wrote in message
news:95ed2053a3ada7g79n7m3lg14npej6omhm_at_4ax.com...


> >Look at the Row Source Operation outputs from tkprof,

> >and compare these with the theoretical plans  from explain

> >plan - then check where the ROWS in the row source

> >operation diverges wildly from the CARDINALITY in

> >explain plan.  In theory, the ROWS is the number of

> >rows that came out of a row source, and the CARDINALITY

> >is the predicted number of rows.

>

> I have examined quite a few of tkprof output files coping with this

> issue. I must admit that I still miss your point. The way from the

> numbers reported by tkprof to the solution is a little bit too long

> for me to comprehend.

>


Sorry, your posts are sufficient well-formed that I've
got into the habit of not filling in all the details.  The point
of comparing the ROWS to the CARDINALITY is that
this can tell you where changes in the optimizer calculations
have pushed the optimizer into a different plan; and this
may give you some clues about which new feature is
causing a problem.



> 1) optimizer_max_permutations changed from 80000 (8i) to 2000 (9i). In

> view of the number of tables involved and the complexity of the query,

> a much larger number of combinations is possible and the CBO perhaps

> hit the top of 2000 before considering the optimal one.

>


It's a little unlikely, but possible.  You need to optimize
the query with event 10053 and check the output for
lines starting with the words "Join Order".  I have read
an Oracle note that the effect of changing the parameter
is to change the way in which the Join Order permutation
take place.  But in a recent (very simple) test involving
12 tables, I found that
    optimizer_max_permutations = 80,000
            join orders examined = 2938
    optimizer_max_permutations = 2,000
            join orders examined = 666
And the 666 join orders were simply the first 666 join
orders from the 2938 - contrary to the note.




> 2) I read some papers about the new features of the CBO in 9i and

> realized that system statistics were missing. They are being gathered

> now. Surprisingly, the execution times reported in the tkprof output

> are enormous since then; they seem to be reported in microseconds now

> (why?).


It's an enhancement - when a CPU running a 20MHz is fast, a
granulariry of 1/100 is considered adequate, when fast means
3.2 GHz, the granularity needs to be improved.  You've probably
noticed, though, that the elapsed times are to the microsecond.,
the CPU times (on most platforms) are still accurate only to 1/100.




>

> By the way, according to my modest knowledge, if system statistics are

> missing, the CBO does not invent substitute values but falls back to

> the 8i formula (which is a very simplistic one).

>


Correct, my error - it's 10g that misbehaves.  (And even that's
not quite an accurate statement; it's just that when I try to
use dbms_stats.delete_system_stats, Oracle deletes them,
then puts something back, and still using CPU costing).




> 3) I considered gathering histograms but found some warnings that

> doing so could adversely affect performance. I will have to

> consolidate my knowledge on this point yet. At least in 9i, using

> method_opt=>'for all columns skewonly' should allow Oracle to take the

> decision by itself.

>


Again, the 'ROWS vs CARDINALITY' thing applies, but there isn't
much difference between 8 and 9 in this area.  Except that 9i uses
bind variable peeking, so could use the full feature set of histogramd
for the first optimization of a query with bind variables - and therefore
produce a different path from 8 in the same circumstances.




> 4) I changed the dynamic_sampling system setting to a value of 3, just

> in case. As a result, the queries suddenly could not be *parsed* any

> more (namely the analyzing statement, as shown by tkprof). Now, every

> time the query is tarted, tkprof shows some statistics gathering

> activity as a *first* step. This strikes me a bit, since all relevant

> objects are actually analyzed. I takes only a couple of seconds to

> execute, anyway. I had to flush the shared_pool to solve the parse

> problem.

>





> 5) I changed the session parameter cursor_sharing to "similar" (was

> "force"). The CBO should be allowed some freedom to reconsider a new

> execution plan when something but the SQL code itself changes.

>


This may be a clue to why paths have changed - see BIND VARIABLE
PEEKING above.  If you code uses the 'literal values' mechanism, but has
been patched by using the 'cursor_sharing = force' option, then
bind-variable
peeking will apply, and every statement will be optimized according to the
first set of values that were actually used.  However - once you change to
'cursor_sharing = similar', Oracle change literals for binds, check if the
relevant columns have histograms and if so, re-optimize.  This may explain
why some specific statements suddenly change their access paths. As a
completely different issue though if you needed 'cursor_sharing = force'
to avoid stress on the library cache latches, you may have re-introduced it.




> Unfortunately, the query (which is rather a stored procedure composed

> of several complex statements) can't be run completely to the end now.

> I get the error "resource required with nowait" or something, and I

> suspect that some OEM session from an office PC keeps one table

> locked, while I am sitting at home now. But a small portion of the

> procedure that was actually executed runs much faster now...

>

> 6) I have set "use_stored_outline=false" for the time being, otherwise

> all other changes would be ignored.

>

> 7) The most valuable piece of information seems to be that I got hit

> by some kind of misbehavior of Oracle 9i related to the use of the

> parameters pga_aggregate_target and  workarea_size_policy=auto.

> According to other reports about a similar situation when migrating

> from 8i to 9i, if disabling those parameters and setting

> sort_area_size manually, the problem disappears.

>

> Unfortunately, this can't be done without recycling the instance, and

> I will have to announce and prepare this step (the DB is used round

> the clock).

>

> 8) One more thought of mine: If nothing helps, setting

> compatible='8.1.7' could help me out until the situation clarifies.

>


I think you probably mean optimizer_features_enable.  I've
not played with these, but the compatible option is, I believe,
about physical structures (e.g. format of segment header blocks)
and in some cases you may find that you can't set the value back
if you've used a new feature.

A slightly less drastic, but more time-consuming, option is to
do a 10053 trace against just one query on the 8.1.7 and 9.2
that you have running, and get the list of optimizer-related
parameters from the two trace files.  Look for lines that look
like:
    _feature_enabled = true/false.

The check the tkprof output files where you've found the
most significant changes in costs, and see if the new (bad)
plans look like the result of specific new or enabled features.
For example, I think you'll find in 8.1 something like:
    _unnest_subquery = false
This changes to true in 9.2; but there have been people who've
had specific queries that performed very badly with unnesting.

Switching off one or two features is strategically abetter move
than  switching off everything - and you could raise a TAR
with Oracle to let them know of you example where the specific
feature performs badly.




> 9) If time permits, I will give the "Outline Manager" delivered with

> the OEM a chance. One can edit the execution plan by changing the

> access paths and join type!

>

> Sorry for the lengthy reply.

>


ditto -




> Thanks a lot

> Rick Denoire

>
Received on Mon Feb 09 2004 - 03:57:07 CST