Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: dba_tables.num_rows is less than dba_indexes.num_rows (why is compute out of the question?)

RE: dba_tables.num_rows is less than dba_indexes.num_rows (why is compute out of the question?)

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Tue, 9 Aug 2005 05:20:51 -0400
Message-ID: <KNEIIDHFLNJDHOOCFCDKGEDPGMAA.mwf@rsiz.com>


Wolfgang and Lex really answered your primary question. I'm curious about a combination of comments in your question:

>>

I've even used estimate_percent of 50 and still getting lower numbers for the table.

The table is 30G in size and growing, so a COMPUTE is out of the question.

>>

<snip>

So - you already know approximately the time the statistics COMPUTE would run (2 times 50 percent estimate). Whether or not it is out of the question depends on your operational situation. (You didn't mention how long it took for a 50 percent estimate, but that is only a curious aside not related to the analysis of the situation.)

I contend that the real question is whether you have wasteful plans* with the current statistics and whether you can make appreciable savings with better* statistics.

Wasteful plans -

    abstractly - suboptimal. That is, another plan would resolve the query and return the required results with some combination of less real time and with less resources consumed.

    If interactive users or a sequential stream of batch operations in a limited operational window is the important factor of the business situation, then less real time is likely to be more important.

    If no one or no thing is "waiting" for the results in a way that matters, then less resources consumed may be more important (if you are near a margin where the percentage consumption of the throughput capacity of some component of your system is driving you toward a "knee curve" performance degradation). If you have many concurrent users and/or competing parallel batch streams, this may be an important consideration.

Better statistics:

  1. As in a compute is a "better" statistic than a 10 percent estimate.
  2. Most recently collected.
  3. I almost typed histograms - read entire long threads from this forums on the good, bad, and indifferent about which histograms and when depending on your data texture and the queries you write against the data.

Now, IF you were to test your important queries against a static copy of your database, and you got the same plans with, say, a 10 percent estimate and a COMPUTE, then even though the COMPUTE has "better" statistics, it makes no difference to you except that it possibly cost* more to collect the COMPUTEd statistics. (Cost*, sigh, well if you already bought the resource and no one and no thing is waiting to use it otherwise, you might reasonably consider it free.)

On the other hand, if COMPUTE gets you better plans such that the cost of collecting COMPUTE minus the cost of x percent estimate minus the sum of the savings in the queries is less than zero, then COMPUTE makes sense. (Of course between 10 percent and COMPUTE there may be a lesser number where the plans reliably stabilize as the "best" plans.)

I've seen several "thumbrules" posted about the average best practice, and I won't even quote them. What matters to you is what happens with your data. Okay, I guess the cost of research matters too.

Now, on the second point: Most recently collected.

The same decision matrix applies: Will more recently collected statistics affect your plans? If you have histograms that are useful in getting better plans that have time components in the keys, and your time components drift with the calendar, then very recent statistics are probably important to you. If you have relatively unskewed data the just grows proportionally, then old statistics may well generate the same plans as new statistics. Knowledge about the texture and change characteristics of your data can help you make a rational GUESS if you know a lot about the particular vintage of the CBO you are using. Well designed tests will give you the answer (and confirm or deny your GUESS, if you made one.) Some people know enough that a GUESS might be described as a scientific prediction based on a theory of operations, and the test usually confirms the GUESS in that case.

If your plans don't change with new statistics collections, but they do change with COMPUTE instead of 50 percent, then an infrequent collection to the COMPUTE level on a shadow system that you then import (statistics) is probably better.

All of these things are sliding scales, and just because something might run long in real time does not mean it is not worthwhile. That depends on the benefits.

Sorry for the length of this note. "I didn't have time to make it shorter." (Argue amongst yersefs on that quote.)

And it really didn't directly answer anything you asked, but I hope it was useful to you.

Regards,

mwf

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Aug 09 2005 - 04:22:18 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US