Re: 10g optimizer and dynamic sampling

From: Steve Howard <stevedhoward_at_gmail.com>
Date: 2 Jun 2006 15:21:43 -0700
Message-ID: <1149286903.038251.130750@y43g2000cwc.googlegroups.com>

p.santos000_at_gmail.com wrote:
> Steve Howard wrote:
> > p.santos000_at_gmail.com wrote:
> > > Folks,
> > > I thought I'd run this by you.
> > > For operational reasons we have forced particular customer queries
> > > to execute with
> > > dynamic sampling of 4. We turn this on at the session level via
> > > " alter session set optimizer_dynamic_sampling=4"
> > >
> > > This setting works great on 90% of the customer queries, but I
> > > noticed that every Thursday the
> > > customer queries don't run as optimally because the optimizer gets
> > > it wrong...here is why.
> > >
> > > On thursdays the customer has queries like this:
> > >
> > > SELECT ... FROM TABLE where product_id = xyz;
> > >
> > > The table has 57 million records, and product_id = xyz only returns
> > > 2.3 million records. That is
> > > this particular product is a very small percentage of all the rows
> > > in the table.
> > >
> > > So what happens is that when I execute this query with sampling=4,
> > > the optimizer samples about 4758
> > > blocks and doesn't find any matches, so it thinks very few rows
> > > exist...causing a poor plan. Here
> > > is the actual 10053 sampling information;
> > >
> > > ==================================
> > > level: 4
> > > sample pct. : 0.008178
> > > actual sample size : 4728
> > > filtered sample card. : 0
> > > orig. card : 56794895
> > > ....
> > > min. sel. est. : .04116057
> > > ** Using single table dynamic sel. est. : .00014659
> > >
> > > TABLE: PRODUCT_11231 Alias: PRODUCT_11231
> > > Original Card: 56794895 Rounded: 8326
> > > ==================================
> > >
> > > So the sampling here appears to get it wrong. It estimated that the
> > > query
> > > returns about 8326 records, when in reality it returns 2.3 million.
> > >
> > > I ran this with default sampling of 2 and the optimizer does get it
> > > right. However I'm
> > > not considering changing the sampling from 4 to 2 because it will
> > > affect 90% of the customers
> > > other queries.
> > >
> > > So my question is:
> > > When the optimizer uses dyn sampling other than 2, and it's querying
> > > a very small
> > > percentage of the overall table data, it appears to get it wrong...
> > > Is there any configuration
> > > or setting that can tell the optimizer the following:
> > > " If you sample and get n results, default to dyn sampling of 2"...
> > > OR
> > > How can I increase the accuracy of the sampling ..other than
> > > going up to higher levels?
> > > Probably wishful thinking on my part, but I figure I'd ask.
> > >
> > > - Any input would be greatly appreciated.
> > > -peter
> >
> > Out of curiosity, what performance impact did you see from going
> > higher?
> >
> > Regards,
> >
> > Steve

> I'm not sure what you mean by going higher, but by default in 10g
> sampling is set
> to 2 at the database level...which is fine. Level 4 samples just a
> higher percentage of
> blocks which may or may not allow for better cardinality estimates...in
> my case I
> haven't really seen an improvement..but I haven't tested higher
> levels...obviously the
> higher the level the longer the sampling phase is going to take.

> I haven't noticed any specific impact with going higher, I'm just not
> sure if that's the
> best approach.

>
> -peter

Thanks, Peter.

Levels above 4 begin exponentially increasing the number of samples taken. 5 is two times the default samples size, 6 is four times, etc. I am curious to see if the higher sample sizes will increase your query efficiency without a performance penalty.

Regards,

Steve Received on Fri Jun 02 2006 - 17:21:43 CDT