is SIZE AUTO ever wrong? or am I wrong?
Date: Fri, 20 Nov 2009 12:05:13 -0500
Message-Id: <A7D6B815-CF33-4EF0-B03C-06C920775A3A_at_gmail.com>
Does Oracle ever make the wrong decision when using FOR COLUMN column_name SIZE AUTO?
I ask because I have table letter_bin with 4778159 rows, claims with 487842 rows.
letter_bin is indexed by both clm_id and prty_id
Here are the distributions of the counts of the columns:
table claims column clmt_prty_id
count min max avg stddev
---------- ---------- ---------- ---------- ----------
319156 1 29 1.52855657 .939775607
table letter_bin column prty_id
count min max avg stddev
---------- ---------- ---------- ---------- ----------
234326 1 1712371 20.3912157 3542.44185
table letter_bin column clm_id
count min max avg stddev
---------- ---------- ---------- ---------- ----------
470855 1 1424 10.147916 7.88601772
I just now realized that the numbers for letter_bin include NULLs.
excluding NULLs, I get
table letter_bin column clm_id
count min max avg stddev
---------- ---------- ---------- ---------- ----------
470859 1 1424 10.1464345 7.82825864
table letter_bin column prty_id
count min max avg stddev
---------- ---------- ---------- ---------- ----------
234326 1 42314 13.0835759 188.911964
letter_bin is typically queried by both clm_id and prty_id at the same time. although both columns are NULL able, when the table is queried in a certain fashion, both values are used, one as a literal and one as a result of a join. prty_id is supplied, clm_id is the result of querying by clmt_prty_id
I analyzed letter bin using for columns prty_id size auto, and for columns clm_id size auto. both 100% samples, execute separately.
When I compute statistics on column prty_id, I get 254 buckets.which I expect. When I compute statistics on clm_id, I get no histogram although I think I should be expecting one. To me, the concept of 'skewed' is somewhat nebulous but I think I ought to get a histogram here.
I prefer letter_bin to by clm_id and not by prty_id. I say this because the average number of rows in letter_bin for a clm_id should be 10 with a lower SD
an example query would be along the lines of select xyz from letter_bin a,claims b where a.prty_id = 123 and a.clm_id=b.clm_id and b.clmt_prty_id=456. The query typically uses access path of letter_bin-party_id and then join to claims. I would expect it to be the other way around.
select count(*) "count",count(distinct &2)"distinct",min(cnt)"min", max(cnt)"max", avg(cnt)"avg", stddev(cnt)"stddev" from ( select /*+ PARALLEL (a 6) */ &2, count(*) cnt from &1 a group by &2 );
Now here's where the plot thickens: This query is executed thousands of times in a batch job. I think it would be more efficient for letter_bin to be accessed by clm_id, and its not. It's getting accessed by prty_id and HJ to claims. Am I being unreasonable, or are my stats no good?
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Nov 20 2009 - 11:05:13 CST