Histogram on VARCHAR2 column -- 32 characters or 32 bytes limit ?

From: Chitale, Hemant K <Hemant-K.Chitale_at_sc.com>
Date: Wed, 19 Feb 2014 14:20:37 +0800
Message-ID: <0BDF2A25A09ADD40908745EEFC0A0FB6016D29D9_at_HKMGAXMB103A.zone1.scb.net>


For a Histogram on a VARCHAR2 column, the limit of reading the first 32 characters by characters or bytes ? I believe that it is 32bytes so becoming a constraint in a multi-byte characterset.

Quoting from Note#212809.1
* Histograms only store the first 32 characters of a character string (5 characters pre 8.1.6 See Bug:598799 <https://support.oracle.com/epmos/faces/BugDisplay?id=598799> ). If histograms are gathered on character columns that have data that is longer than 31 characters and the first 31 characters are identical then column histograms may not reflect the cardinality of these columns correctly as these values will all be treated as if they are identical.

In my test below, USER_TAB_COL_STATISTICS shows a Height Balanced Histogram with 69 buckets for 69 distinct values. But USER_TAB_HISTOGRAMS shows only 1 bucket. So there is a discrepancy between these two views.

SQL>l
  1* create table hkc_test_hist (id_column number(5), data_col_1 varchar2(50))
SQL>/ Table created.

SQL>insert into hkc_test_hist
  2 select rownum, rpad ('X',32,'Y') || dbms_random.string('u',10)   3 from dual connect by level < 70;

69 rows created.

SQL>
SQL>select count(distinct(substr(data_col_1,1,32))) from hkc_test_hist;

COUNT(DISTINCT(SUBSTR(DATA_COL_1,1,32)))


                                       1

1 row selected.

SQL>select count(distinct(substr(data_col_1,1,49))) from hkc_test_hist;

COUNT(DISTINCT(SUBSTR(DATA_COL_1,1,49)))


                                      69

1 row selected.

SQL>
SQL>commit;

Commit complete.

SQL>exec
dbms_stats.gather_table_stats(USER,'HKC_TEST_HIST',method_opt=>'FOR ALL COLUMNS SIZE 254'); PL/SQL procedure successfully completed.

SQL>
SQL>l
  1 select column_name, num_distinct, histogram, num_buckets   2 from user_tab_col_statistics
  3* where table_name = 'HKC_TEST_HIST'
SQL>/

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS

------------------------------ ------------ --------------- -----------
ID_COLUMN 69 HEIGHT BALANCED 69 DATA_COL_1 69 HEIGHT BALANCED 69

2 rows selected.

SQL>
SQL>select count(*) from user_tab_histograms   2 where table_name = 'HKC_TEST_HIST'
  3 and column_name = 'DATA_COL_1'
  4 /

  COUNT(*)


         1

1 row selected.

SQL>
SQL>l
  1 select count(*) from user_tab_histograms   2 where table_name = 'HKC_TEST_HIST'
  3* and column_name = 'ID_COLUMN'
SQL>/   COUNT(*)


        69

1 row selected.

SQL> Hemant K Chitale

This email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please delete all copies and notify the sender immediately. You may wish to refer to the incorporation details of Standard Chartered PLC, Standard Chartered Bank and their subsidiaries at https://www.sc.com/en/incorporation-details.html.

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 19 2014 - 07:20:37 CET

Original text of this message