Re: 10g optimizer and dynamic sampling

From: <p.santos000_at_gmail.com>
Date: 2 Jun 2006 13:34:51 -0700
Message-ID: <1149280491.234262.215660@h76g2000cwa.googlegroups.com>

Steve Howard wrote:
> p.santos000_at_gmail.com wrote:
> > Folks,
> > I thought I'd run this by you.
> > For operational reasons we have forced particular customer queries
> > to execute with
> > dynamic sampling of 4. We turn this on at the session level via
> > " alter session set optimizer_dynamic_sampling=4"
> >
> > This setting works great on 90% of the customer queries, but I
> > noticed that every Thursday the
> > customer queries don't run as optimally because the optimizer gets
> > it wrong...here is why.
> >
> > On thursdays the customer has queries like this:
> >
> > SELECT ... FROM TABLE where product_id = xyz;
> >
> > The table has 57 million records, and product_id = xyz only returns
> > 2.3 million records. That is
> > this particular product is a very small percentage of all the rows
> > in the table.
> >
> > So what happens is that when I execute this query with sampling=4,
> > the optimizer samples about 4758
> > blocks and doesn't find any matches, so it thinks very few rows
> > exist...causing a poor plan. Here
> > is the actual 10053 sampling information;
> >
> > ==================================
> > level: 4
> > sample pct. : 0.008178
> > actual sample size : 4728
> > filtered sample card. : 0
> > orig. card : 56794895
> > ....
> > min. sel. est. : .04116057
> > ** Using single table dynamic sel. est. : .00014659
> >
> > TABLE: PRODUCT_11231 Alias: PRODUCT_11231
> > Original Card: 56794895 Rounded: 8326
> > ==================================
> >
> > So the sampling here appears to get it wrong. It estimated that the
> > query
> > returns about 8326 records, when in reality it returns 2.3 million.
> >
> > I ran this with default sampling of 2 and the optimizer does get it
> > right. However I'm
> > not considering changing the sampling from 4 to 2 because it will
> > affect 90% of the customers
> > other queries.
> >
> > So my question is:
> > When the optimizer uses dyn sampling other than 2, and it's querying
> > a very small
> > percentage of the overall table data, it appears to get it wrong...
> > Is there any configuration
> > or setting that can tell the optimizer the following:
> > " If you sample and get n results, default to dyn sampling of 2"...
> > OR
> > How can I increase the accuracy of the sampling ..other than
> > going up to higher levels?
> > Probably wishful thinking on my part, but I figure I'd ask.
> >
> > - Any input would be greatly appreciated.
> > -peter

> Out of curiosity, what performance impact did you see from going
> higher?

> Regards,

>
> Steve

I'm not sure what you mean by going higher, but by default in 10g sampling is set
to 2 at the database level...which is fine. Level 4 samples just a higher percentage of
blocks which may or may not allow for better cardinality estimates...in my case I
haven't really seen an improvement..but I haven't tested higher levels...obviously the
higher the level the longer the sampling phase is going to take.

I haven't noticed any specific impact with going higher, I'm just not sure if that's the
best approach.

-peter Received on Fri Jun 02 2006 - 15:34:51 CDT