Re: is SIZE AUTO ever wrong? or am I wrong?

From: Kellyn Pedersen <kjped1313_at_yahoo.com>
Date: Fri, 20 Nov 2009 09:34:24 -0800 (PST)
Message-ID: <168875.36687.qm_at_web32002.mail.mud.yahoo.com>



Is SIZE AUTO ever wrong?  I think that's a relative question.  Is the design of the table incorrect for it's usage?  Are the joins incorrect in the code that are accessing the table and indexes involved?  Are your parameters and statistics collection set up optimally for the environment? There are so many variables when it comes to how, how often, what and why statistics are collected.
 

First question-  Have you tested a hint for the query to change the execution path and did it enhance performance as you thought it would or is it an assumption that the query would perform better if it lead by clm_id?
 

Secondly-  Oracle's statistics collection is not an exact science and I do believe some DBA involvement should occur.  There are going to be approximately 5% performance issues that will require a change in the statistics collection choices in any environment (and in large environments, it's a guarantee, as far as my opinion.) 
 

If you hinted the query, tested it and found that there would be better performance if the statstics were changed, then change the method_opt in your dbms_stats collection for this table.  Find out which one works the best for this query, but if it is not an impact on the system, I would not concern myself with it, to be at all honest. 

Kellyn Pedersen
Multi-Platform DBA
I-Behavior Inc.
http://www.linkedin.com/in/kellynpedersen
 

"Go away before I replace you with a very small and efficient shell script..."

  • On Fri, 11/20/09, neil kodner <nkodner_at_gmail.com> wrote:

From: neil kodner <nkodner_at_gmail.com> Subject: is SIZE AUTO ever wrong? or am I wrong? To: "oracle-l List" <oracle-l_at_freelists.org> Date: Friday, November 20, 2009, 10:05 AM

Does Oracle ever make the wrong decision when using FOR COLUMN column_name SIZE AUTO?

I ask because I have table letter_bin with 4778159 rows, claims with 487842 rows. 

letter_bin is indexed by both clm_id and prty_id

Here are the distributions of the counts of the columns:

table claims column clmt_prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    319156          1         29 1.52855657 .939775607

table letter_bin column prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    234326          1    1712371 20.3912157 3542.44185

table letter_bin column clm_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    470855          1       1424  10.147916 7.88601772

I just now realized that the numbers for letter_bin include NULLs.

excluding NULLs, I get

table letter_bin column clm_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    470859          1       1424 10.1464345 7.82825864

table letter_bin column prty_id

     count        min        max        avg     stddev
---------- ---------- ---------- ---------- ----------
    234326          1      42314 13.0835759 188.911964

letter_bin is typically queried by both clm_id and prty_id at the same time.  although both columns are NULL able,  when the table is queried in a certain fashion, both values are used, one as a literal and one as a result of a join.  prty_id is supplied, clm_id is the result of querying by clmt_prty_id

I analyzed letter bin using for columns prty_id size auto, and for columns clm_id size auto.  both 100% samples, execute separately.

When I compute statistics on column prty_id, I get 254 buckets.which I expect.  When I compute statistics on clm_id, I get no histogram although I think I should be expecting one.  To me, the concept of 'skewed' is somewhat nebulous but I think I ought to get a histogram here.

I prefer letter_bin to by clm_id and not by prty_id.  I say this because the average number of rows in letter_bin for a clm_id should be 10 with a lower SD

an example query would be along the lines of select xyz from letter_bin a,claims b where a.prty_id = 123 and a.clm_id=b.clm_id and b.clmt_prty_id=456. The query typically uses access path of letter_bin-party_id and then join to claims.  I would expect it to be the other way around.

select count(*) "count",count(distinct &2)"distinct",min(cnt)"min", max(cnt)"max", avg(cnt)"avg", stddev(cnt)"stddev"
  from ( select /*+ PARALLEL (a 6) */ &2, count(*) cnt from &1 a group by &2 );

Now here's where the plot thickens:  This query is executed thousands of times in a batch job.  I think it would be more efficient for letter_bin to be accessed by clm_id, and its not.   It's getting accessed by prty_id and HJ to claims.  Am I being unreasonable, or are my stats no good?

--
http://www.freelists.org/webpage/oracle-l





      
--
http://www.freelists.org/webpage/oracle-l
Received on Fri Nov 20 2009 - 11:34:24 CST

Original text of this message