Histogram worthwhile?

From: Roberto Veiga <roberto_veiga_at_yahoo.com>
Date: Tue, 24 Feb 2009 06:01:10 -0800 (PST)
Message-ID: <603000.35916.qm_at_web35604.mail.mud.yahoo.com>

We have another performance problem to solve. The performance that was good, now is bad. Very bad with an unacceptable performance and nobody makes any changes to the system. Nor the analyst neither the dba. Why the performance is not ok anymore?

I think the answer is the data. The data is changing every day, every hour, every minute and that query that always was ok is taking a long time to finish. It is running slow and we have to have an explanation for this situation.

Oracle has a very good feature named histogram. And I think the histogram was created just for this. When the data change, the histogram can see that change and send information the the optimizer that uses that information to build a better access plan. So where is the problem?

First lets talk about bind variables. We always have heard that there are only benefits in using bind variables. This reduce the hard parses, relieve the memory consumption and let a clean code.

But when a column doesnt have values with a balanced distribution, bind variables could be a problem. Oracle has a phase before the access plan could be generated that is "bind variable peeking". The first value received by the query is the one that is going to be used by the optimizer to decide for the best access plan and this plan is going to be used for all future executions. But what if the second value can retrieve a different amount of data than the first one?

If my data has a skewed distribution and i am using bind variables, the optimizer is not going to rebuild the access plan every time the value changes even if I had collected histograms. For one specific value is better use ,for example, the index idx1, for another value the best performance is achieved using the idx2 and for another value the best choice would be do a full table scan.

I dont know if this is a problem or if it is just a optimizer's behavior, but I am doing a research to find some alternatives to this problem. And I find some (thanks Tom Kyte):

.use bind variables but dont use histograms. so the plan is going to be the same for every single execution. one size fits all approach
.dont use bind variables but use histograms. so the optimizer will evaluate the plan every execution and discover the best plan
.use bind variables, use histogram but change the code. if the value is in this range, execute this query. if not, execute another query. i need to know distribution data. using the width_bucket function could be a good way to discover how is the distribution

Anyone have another idea?



      Veja quais são os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com

Received on Tue Feb 24 2009 - 08:01:10 CST

Original text of this message