Re: CBO rows estimate way out in 2 table join

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Fri, 2 Nov 2007 07:54:27 -0000
Message-ID: <7omdnY0KOoolSrfaRVnyjAA@bt.com>

"joel garry" <joel-garry_at_home.com> wrote in message news:1193959828.744083.306450_at_i38g2000prf.googlegroups.com... wrote:
> "stephen O'D" <stephen.odonn..._at_gmail.com> wrote in message
>
> > SQL> explain plan for
> > SELECT
> > --count(*)
> > cpd.customer_ref
> > FROM custproductdetails cpd, ACCOUNT a
> > WHERE cpd.account_num = a.account_num
> > AND cpd.customer_ref = a.customer_ref
> > AND a.currency_code = 'GBP'
> > AND a.invoicing_co_id = 1 2 3 4 5 6 7 8
> > 9 ;
>

>
> begin
> dbms_stats.gather_table_stats({owner},'custproductdetails',
> cascade=>true, method_opt=>'for all columns size 1');
> dbms_stats.gather_table_stats({owner},'account', cascade=>true,
> method_opt=>'for all columns size 1');
>
> dbms_stats.gather_table_stats({owner},'account', method_opt=>'for
> columns currency_code, invoicing_code_id size 254');
>
> end;
>

This seems a bit strange to me, given the cascade true, 10K distributed skewed across 70M rows? Wouldn't it be better to use a larger percent on the table, no cascade, and separately completely analyze the index (and any others)? Yes it would take hours, but wouldn't it likely be worth it to get a better execution time? Am I missing an elephant in my blind flailing?

Order of quote:

    Steve's SQL
    My suggested stats collection
    Joel's question

Joel,
The first step of the execution plan is one that predicts 855 rows instead of nearly 10,000. We are told that the data is skewed on the two filter columns.

So step one in finding a solution (since I'm not on site to run a few data sampling queries) is to generate stats that are typically appropriate for this sort of situation, viz:

compute for small table and its indexes small sample for large table and its indexes histograms on known skewed columns

What I want to see is if the histograms will make the first table in the plan report something closer to 10,000 rows.

If the join columns between the two then have a uniform distribution, the join cardinality should be much better if the driving cardinality is nearly correct.

STEP 2: examine the data distribution of the join columns and consider the need to generate histograms at both ends of the join - but you don't generate histograms unless you really need to. As part of STEP 2, I would also suggest reading Alberto Dell'Era's terrific document about "joins over histograms"

http://www.adellera.it/investigations/join_over_histograms/index.html

I've also referenced it from my blog:
http://jonathanlewis.wordpress.com/2007/04/25/histograms-and-joins/

-- 
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Received on Fri Nov 02 2007 - 02:54:27 CDT

Re: CBO rows estimate way out in 2 table join - what to do?