Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: histogram oddity: Height-balanced histograms where NDV < num_buckets

Re: histogram oddity: Height-balanced histograms where NDV < num_buckets

From: Charles Schultz <sacrophyte_at_gmail.com>
Date: Fri, 11 Aug 2006 12:07:40 -0500
Message-ID: <7b8774110608111007k619b04f4vfdd97350f7f18af7@mail.gmail.com>


Further insights:

The system view (dba_tab_col_statistics) is built upon a couple other nested system views. After digging in, I came across this piece of code from the base view (dba_tab_cols):

       case when nvl(h.row_cnt,0) = 0 then 'NONE'
            when (h.bucket_cnt > 255
                  or
                  (h.bucket_cnt > h.distcnt and h.row_cnt = h.distcnt
                   and h.density*h.bucket_cnt*2 <= 1))
                then 'FREQUENCY'
            else 'HEIGHT BALANCED'
       end,

In order by this column to be "HEIGHT BALANCED", it has to meet the following criteria:

After checking off the first two items for a particular example (table INDPART$, column TS#, bucket_cnt = 217), I looked at the 3rd. I found out that bucket_cnt > h.distcnt. So that explains why the view reported this information, but why is bucket_cnt > h.distcnt? In working through this example, I also found cases where bucket_cnt > 254, which encouraged me to check for the largest:
SQL > select max(bucket_cnt) from sys.hist_head$;

MAX(BUCKET_CNT)


        2631537

Wow. I guess I really do not know what bucket_cnt means - I thought it was a count of histogram buckets, which I understaood to be limited to 254.

But back to the example (INDPART$.TS#) for a second. SQL > select count(*), count(distinct TS#) from sys.INDPART$;

  COUNT(*) COUNT(DISTINCTTS#)

---------- --------------------
       236                    1

To make matters even more confusing
SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_histograms where table_name = 'INDPART$' and column_name = 'TS#';

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------

            217 79

Color me confused. In some ways, I hope I am missing something obvious. *grin*

PS - some day I need to get Wolfgang or Jonathan to further explain the HEIGHT BALANCED Density formula. I am having trouble with "the sum of the squared frequencies fo all non-popular values divided by the sum of the frequencies of all non-popular values times the count of rows with not null values". I would love to see those complex examples that you mention in writing, and I regret missing the presentations to see in person.

On 8/11/06, Charles Schultz < sacrophyte_at_gmail.com> wrote:
>
> Oracle 10.2.0.2, Solaris 8
>
> Wondering if anyone could explain this. We have 105 SYS-owned tables where
> num_distinct < num_buckets according to dba_tab_col_statistics (see
> following query). Even weirder, some columns have 217 buckets for only 1
> distinct value (verified in dba_tab_histograms). What kind of sense does
> that make? Similarly, we have one application partition that exhibits the
> same symptom.
>
> select distinct table_name, column_name, num_distinct, num_buckets
> from dba_tab_col_statistics where histogram = 'HEIGHT BALANCED' and
> num_distinct < num_buckets
> order by num_distinct,num_buckets
> /
>

-- 
Charles Schultz

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Aug 11 2006 - 12:07:40 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US