Re: Cost vs. Rule based optimizer

From: Jonathan Lewis <Jonathan_at_jlcomp.demon.co.uk>
Date: 1995/04/20
Message-ID: <798414167snz_at_jlcomp.demon.co.uk>#1/1


In article <1995Apr20.021624.22182_at_cs.uno.edu>

           kcutress_at_cs.uno.edu "kevin patrick cutress" writes:

: For example, I have a table with 705,000 records. Approximately 12,000 records
: have a commodity code equal to 33440. Approximately 28,000 records have a
: state code equal to AK. The query select sum(tons) from detail94 where
: commodity = 33440 and state = 'AK' consistently takes 3 to 5 minutes using the
: cost based optimizer and less than 30 seconds using the rule based optimizer.
: Both fields in the where clause are indexed.
 

: Does anyone have any ideas on why there should be such a difference? Can I
: make the cost based optimizer work or should I abandon the effort and make
: rule based optimization the default?

My theory, aired publicly so that it can be shot at:

Assumption 1:
The prime directive for the Cost Based Optimiser is to minimise the number of actual visits to disk as these are the slowest thing around/

Assuption 2:
The CBO estimates from the gathered statistics the total number of blocks it will have to hit to satisfy the query (NB this is not the same as the number of logical I/Os, as it will use the cluster factor to cater for multiple relevant rows being in a single block).

Assumption 3:
The CBO assumes that every block required will have to be fetched from disc and does not take into account the historic hit-rate to determine a more realistic number of visits to disk.

Assumption 4:
The oral history tells us that visiting a disk for a single block takes the same amount of time as visiting a disk to pick up a number of consecutive blocks.

Applying this to your example:

   705,000 records of which 12,000 will be selected by commodity code,    or 28,000 will be selected by state code.

   Guessing that commodity codes, and state codes are distributed evenly    throughout the table, and that your row size is around 120 bytes,    and your database is on 4K block: we end up with

      ca. 30 rows per block,
      ca. 1 row in 60 for the given commodity code
      ca. 1 row in 25 for the given state code.

On this basis, using the commodity code index (the better of the two) will result in Oracle picking up every other block of the table one at a time: on the other hand, if Oracle does a tablescan it can get the whole lot 8 blocks at a time (typical db_multi_block_read_count) which means that the tablescan uses 1/4 of the visits of the indexed access.

And, of course, from your perspective the answer is wrong because:

  1. you probably have a lot of the required blocks in the buffer already and
  2. when the tablescan is hammering away, there is probably a huge amount of contention for the same set of disc heads.

Using the stats above, and assuming a fairly typical 20 millisecs per disk read, then the time for the indexed access would be about 240 seconds -- so a persistent hit ratio of 90% would give you your 30 seconds performance. On the other hand, the the tablescan should show a result after about 60 seconds: try it some time when the system is otherwise inactive to see if it comes close.

Two other points:
1) I've recently been talking to someone about their use of the CBO,

    and at 7.1.3.2 they find that the ANALYZE command blows itself     apart (in some unspecified way) after about 100,000 rows: did     yours really survive to 700,000 ?

2) If your data is smoothly distributed across all values, and if

    your query is run regulary, then it would probably be sensible     to have an index on (commodity_code, state) to bring the average     response down to sub 3 seconds.

NB: If anyone can produce a definitive judgement on the 4 assumptions at the top, then I would like to hear it.

-- 
Jonathan Lewis
Received on Thu Apr 20 1995 - 00:00:00 CEST

Original text of this message