Re: 10g optimizer and dynamic sampling

From: Ronald Rood <devnull_at_ronr.nl>
Date: Sun, 4 Jun 2006 12:09:00 +0200
Message-ID: <0001HW.C0A87DDC00717597F0284530@news.individual.net>

On Fri, 2 Jun 2006 21:17:11 +0200, p.santos000_at_gmail.com wrote (in article <1149275831.679162.135990_at_u72g2000cwu.googlegroups.com>):

> 
> Ronald Rood wrote:
>> On Fri, 2 Jun 2006 16:42:46 +0200, p.santos000_at_gmail.com wrote
>> (in article <1149259366.409877.203280_at_i39g2000cwa.googlegroups.com>):
>>

>>> to execute with
>>> dynamic sampling of 4. We turn this on at the session level via
>>> " alter session set optimizer_dynamic_sampling=4"
>>> SELECT ... FROM TABLE where product_id = xyz;
>>>

>> <snip>

>>> The table has 57 million records, and product_id = xyz only returns
>>> 2.3 million records. That is
>>> this particular product is a very small percentage of all the rows
>>> in the table.

>> <snip>

>>>
>>> So my question is:
>>> When the optimizer uses dyn sampling other than 2, and it's querying
>>> a very small
>>> percentage of the overall table data, it appears to get it wrong...
>>> Is there any configuration
>>> or setting that can tell the optimizer the following:
>>> " If you sample and get n results, default to dyn sampling of 2"...
>>> OR
>>> How can I increase the accuracy of the sampling ..other than
>>> going up to higher levels?
>>> Probably wishful thinking on my part, but I figure I'd ask.

>> 
>> Why not take care for correct statistics and forget about the dynamic
>> sampling ? to my opinion dynamic sampling just adds extra overhead to the
>> query, increasing the response times.
>> 
>> 
>> --
>> With kind regards / met vriendelijke groeten,
>> Ronald
>> 
>> http://ronr.blogspot.com/
>> http://ronr.nl/unix-dba
> 
> Well that's a good point, but here is the difficult thing about this
> system. Our customers
> upload data anytime they want at any point of the day. So in the
> morning COL1='X' could
> return 100 rows, but the same query in the evenning could yield a
> million rows.
> 
> Most of the queries executed on our system take anywhere from 5 minutes
> to 30-45 minutes, so a couple of extra seconds is for the sampling is
> insignificant.

Do you need a different plan for these different situations ? If all distributions vary by that amount, it could still be possible to use the same plan. If you do need new plans, why not gather stats as part of the load process.
I agree that with those runtimes the few seconds for sampling does not add a significant time for the runtime. If scalability is no problem, it's ok. I would go for sampling as part of the load.

--
With kind regards / met vriendelijke groeten,
Ronald

http://ronr.blogspot.com/
http://ronr.nl/unix-dba

Received on Sun Jun 04 2006 - 05:09:00 CDT