RE: re "CBO - A Configuration Roadmap" -- Histograms on Non-Indexed Columns

From: Christian Antognini <Christian.Antognini_at_trivadis.com>
Date: Fri, 29 Dec 2006 07:35:20 +0100
Message-ID: <F2C9CCA71510B442AF71446CAE8AEBAF8B594A@MSXVS04.trivadis.com>

Hi Hemant

You probably already got enough information from Mladen and especially from Wolfgang... Anyway, let me show you the demo I eventually show when I present that paper...

> Can you explain the second sentence about Histograms on non-indexed
> columns as well ?
> How would they be useful ? I would understand Histograms on
> non-indexed columns
> as providing _me_ information about the data in those columns and
> which could allow
> me to make a judgement call as to whether I should index the
> columns. How does the
> optimizer use Histograms on non-indexed columns ?

First of all let's create a table with a primary key (ID) and two additional columns (COL1 and COL2). For the purpose of this demo especially COL1 is interesting... In fact it contains one time all the values from 1 up to 500 and 500 times the value 666.

SQL> CREATE TABLE t (id, col1, col2, CONSTRAINT t1_pk PRIMARY KEY (id))   2 AS
  3 SELECT rownum, CASE WHEN rownum>500 THEN 666 ELSE rownum END, rownum
  4 FROM dba_objects WHERE rownum <= 1000;

Then let's gather statistics without histograms.

SQL> BEGIN
2 dbms_stats.gather_table_stats(

  3     ownname=>user,
  4     tabname=>'T',
  5     cascade=>TRUE,
  6     method_opt=>'for all columns size 1');

7 END;
8 /

Now let's check the query optimizer estimations when COL1 is referenced in the WHERE clause... Since no histograms are available and that the query optimizer "only" knows min/max value and number of distinct values, its estimation for the predicate "t2.col1 = 666" (see line 2 of the execution plan) is that two rows are returned. Why two? Basically because it multiply the number of rows store in the table (1000) by the selectivity of the predicate (1/num_distinct = 1/501 = 0.002). The interesting thing to notice is that this estimation, i.e. the high selectivity, leads to a nested loop.

SQL> SET AUTOTRACE TRACE EXP
SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

Execution Plan

Plan hash value: 2076359606

| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     2 |
|   1 |  NESTED LOOPS                |       |     2 |
|*  2 |   TABLE ACCESS FULL          | T     |     2 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T     |     1 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |     1 |
------------------------------------------------------

Predicate Information (identified by operation id):

2 - filter("T2"."COL1"=666)
4 - access("T1"."ID"="T2"."ID")

Then let's gather a histogram on COL1...

SQL> BEGIN
2 dbms_stats.gather_table_stats(

  3     ownname=>user,
  4     tabname=>'T',
  5     cascade=>TRUE,
  6     method_opt=>'for columns id size 1, col1 size 254');

7 END;
8 /

Now the estimations have changed. In fact with the histogram on COL1 the query optimizer knows that the predicate "t2.col1 = 666" has a much higher selectivity (circa 0.5). And since the number of returned rows is much higher now the query optimizer decides to use a hash join...

SQL> SELECT * FROM t t1 JOIN t t2 USING (id) WHERE t2.col1 = 666;

Execution Plan

Plan hash value: 2135975663

|   0 | SELECT STATEMENT   |      |   506 |
|*  1 |  HASH JOIN         |      |   506 |
|*  2 |   TABLE ACCESS FULL| T    |   506 |
|   3 |   TABLE ACCESS FULL| T    |  1000 |
-------------------------------------------

Predicate Information (identified by operation id):

1 - access("T1"."ID"="T2"."ID")
2 - filter("T2"."COL1"=666)

So, the aim of this demo is to show that the query optimizer, thanks to a histogram, doesn't change the access path for a table but the join method.

Best regards,
Chris

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Dec 29 2006 - 00:35:20 CST