Re: The 20% rule

From: DA Morgan <damorgan_at_psoug.org>
Date: Fri, 25 Jul 2008 13:37:24 -0700
Message-ID: <1217018240.320675@bubbleator.drizzle.com>


Charles Hooper wrote:
> On Jul 24, 8:29 pm, DA Morgan <damor..._at_psoug.org> wrote:

>> Charles Hooper wrote:

>
>>> Every one of my plans looked like the above.  There are a couple very
>>> good reasons why a full table scan was selected for the above SQL
>>> statement.  Just a few:
>>> * DB_FILE_MULTIBLOCK_READ_COUNT is set to 128 (and MBRC, used by the
>>> cost based optimizer, is set to 100), so Oracle believes that it is
>>> able to read 100 (or 128) blocks in a single IO call.
>>> * The rows are very narrow, so a very large number of rows will fit in
>>> each block.
>>> * I had one row in the table when the SQL statement was hard parsed.
>>> * Gathering statistics on tables and indexes does not immediately
>>> invalidate SQL statements accessing those objects on Oracle 10.2.0.2,
>>> so I would expect to see exactly the same *predicted* plan from
>>> dbms_xplan unless the SQL statement was hard parsed every time it was
>>> executed.
>>> This topic came up at least once before, and I had unfortunately
>>> poorly paraphrased one of Tom Kyte's books in the thread.
>>> Fortunately, Tom saw the thread and offered some clarification:
>>> http://groups.google.com/group/comp.databases.oracle.server/browse_th...
>>> Charles Hooper
>>> IT Manager/Oracle DBA
>>> K&M Machine-Fabricating, Inc.
>> Here are my parameters
>> db_file_multiblock_read_count = 128
>> optimizer_index_caching = 0
>> optimizer_index_cost_adj = 100
>>
>> I ran several versions of it before posting what I did and could
>> reproduce the behavior at will.
>>
>> Both my result, and yours, demonstrate the 20% rule is not a rule
>> any more than any other fabricated Oracle rule is a rule. The only
>> rule that means anything is to test:
>> 1. With your data
>> 2. With your hardware
>> 3. With your version
>> 4. With your workload
>> --
>> Daniel A. Morgan
>> Oracle Ace Director & Instructor
>> University of Washington
>> damor..._at_x.washington.edu (replace x with u to respond)
>> Puget Sound Oracle Users Groupwww.psoug.org

>
> The links posted by Joel are very useful. A great deal of detail in
> Richard Foote's blog entries.
>
> I think that we have to be a little careful with the plans created by
> the optimizer when determining what is the best execution plan - the
> tipping point when an index should or should not be used may be
> difficult to determine by cost alone. A little more experimentation:
> CREATE TABLE t (
> testcol NUMBER,
> bcol NUMBER);
>
> CREATE INDEX ix_t
> ON t (testcol);
>
> INSERT INTO t VALUES (1,11);
>
> COMMIT;
>
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T',CASCADE=>TRUE);
>
> EXPLAIN PLAN FOR
> SELECT bcol
> FROM t
> WHERE testcol = 1;
>
> SELECT * FROM TABLE(dbms_xplan.display);
>
> Now the SQL statement has been hard parsed with a single row in the
> table.
> --------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost |
> --------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | 6 | 2 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 6 | 2 |
> | 2 | INDEX RANGE SCAN | IX_T | 1 | | 1 |
> --------------------------------------------------------------------
>
> From the 10053 level 1 trace:
> Table Stats::
> Table: T Alias: T
> #Rows: 1 #Blks: 5 AvgRowLen: 6.00
> Index Stats::
> Index: IX_T Col#: 1
> LVLS: 0 #LB: 1 #DK: 1 LB/K: 1.00 DB/K: 1.00 CLUF: 1.00
> Access path analysis for T
> ***************************************
> SINGLE TABLE ACCESS PATH
> Single Table Cardinality Estimation for T[T]
> Table: T Alias: T
> Card: Original: 1.000000 Rounded: 1 Computed: 1.00 Non
> Adjusted: 1.00
> Access Path: TableScan
> Cost: 3.00 Resp: 3.00 Degree: 0
> Cost_io: 3.00 Cost_cpu: 35827
> Resp_io: 3.00 Resp_cpu: 35827
> Access Path: index (AllEqRange)
> Index: IX_T
> resc_io: 2.00 resc_cpu: 14613
> ix_sel: 1.000000 ix_sel_with_filters: 1.000000
> Cost: 2.00 Resp: 2.00 Degree: 1
> Best:: AccessPath: IndexRange
> Index: IX_T
> Cost: 2.00 Degree: 1 Resp: 2.00 Card: 1.00 Bytes: 0
>
> The index contains a single block - the root block. With a single row
> in the table, the cost is 2: 1 for the index block access and 1 for
> the table block access. There are 5 blocks in the table below the
> high water mark. The cost of the tablescan is calculated at 3. For
> the cost based optimizer, the lowest cost plan wins, thus the table
> access by index range scan is selected.
>
> Now for the experimentation:
> INSERT INTO T
> SELECT
> MOD(ROWNUM,5),
> ROWNUM
> FROM
> DUAL
> CONNECT BY
> LEVEL<=300000;
>
> Now we have 300,001 rows in the table.
>
> EXPLAIN PLAN FOR
> SELECT bcol
> FROM t
> WHERE testcol = 1;
>
> SELECT * FROM TABLE(dbms_xplan.display);
>
> Now the SQL statement has been hard parsed with a single row in the
> table.
> --------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost |
> --------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | 6 | 2 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 6 | 2 |
> | 2 | INDEX RANGE SCAN | IX_T | 1 | | 1 |
> --------------------------------------------------------------------
>
> I am not sure that the index range scan is appropriate here.
>
> COMMIT;
>
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T',CASCADE=>TRUE);
>
> ALTER SYSTEM FLUSH SHARED_POOL;
>
> EXPLAIN PLAN FOR
> SELECT bcol
> FROM t
> WHERE testcol = 1;
>
> SELECT * FROM TABLE(dbms_xplan.display);
>
> ----------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost |
> ----------------------------------------------------------
> | 0 | SELECT STATEMENT | | 62129 | 424K| 112 |
> | 1 | TABLE ACCESS FULL| T | 62129 | 424K| 112 |
> ----------------------------------------------------------
>
> Now that we forced a hard parse, Oracle switched to a full tablescan.
> From the 10053 level 1 trace:
> Table Stats::
> Table: T Alias: T
> #Rows: 300001 #Blks: 622 AvgRowLen: 7.00
> Index Stats::
> Index: IX_T Col#: 1
> LVLS: 2 #LB: 779 #DK: 5 LB/K: 155.00 DB/K: 528.00 CLUF:
> 2640.00
> SINGLE TABLE ACCESS PATH
> Single Table Cardinality Estimation for T[T]
> Column (#1):
> NewDensity:0.097746, OldDensity:0.000002 BktCnt:5545, PopBktCnt:
> 5545, PopValCnt:5, NDV:5
> Table: T Alias: T
> Card: Original: 300001.000000 Rounded: 61245 Computed: 61244.57
> Non Adjusted: 61244.57
> Access Path: TableScan
> Cost: 112.28 Resp: 112.28 Degree: 0
> Cost_io: 109.00 Cost_cpu: 65654636
> Resp_io: 109.00 Resp_cpu: 65654636
> Access Path: index (AllEqRange)
> Index: IX_T
> resc_io: 701.00 resc_cpu: 27652779
> ix_sel: 0.204148 ix_sel_with_filters: 0.204148
> Cost: 702.38 Resp: 702.38 Degree: 1
> Best:: AccessPath: TableScan
> Cost: 112.28 Degree: 1 Resp: 112.28 Card: 61244.57 Bytes:
> 0
>
> Now there are 622 blocks below the high water mark in the table and
> there are 779 blocks in the index with an index clustering factor of
> 2640. The cost of the full tablescan is calculated at 112.28 and the
> calculated cost of the index access is 702.38. The least expensive
> calculated plan cost is the full tablescan.
>
> The above tests were performed with MBRC manually set to 128.
>
> I agree, it is about testing to see what is appropriate for a specific
> situation. Maybe a 10046 trace at level 8 or a DBMS_XPLAN showing
> actual execution statistics would be a bit more helpful when testing
> performance.
>
> If in my previous test earlier today, the initial hard parse was
> performed when there was a little more data in the table, the plan
> very well could have been a full table scan as the cost of the two
> methods was very close.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.

One of the issues I have with the demo I posted, and as I indicated I only posted it to prove the rule was mythology, is that the bytes for an index row greater than for a table row. One can easily reverse that with something like this:

CREATE TABLE index_demo (
person_id NUMBER(10),

gender    VARCHAR2(1),
state     VARCHAR2(2),
textcol   VARCHAR2(2000));

DECLARE
  g index_demo.gender%TYPE := 'F';
BEGIN
   FOR i IN 1 .. 250000
   LOOP

     INSERT INTO index_demo
     (person_id, gender, state, textcol)
     VALUES
     (i, g, 'WA', RPAD('x', 1799, 'x'));

     IF g = 'F' THEN
       g := 'M';
     ELSE
       g := 'F';
     END IF;

   END LOOP;
   COMMIT;    UPDATE index_demo
   SET state = 'OR'
   WHERE person_id LIKE '%1';

   UPDATE index_demo
   SET state = 'CA'
   WHERE person_id LIKE '%2';

   UPDATE index_demo
   SET state = 'ID'
   WHERE person_id LIKE '%3';

   UPDATE index_demo
   SET state = 'NY'
   WHERE person_id LIKE '%4';

   UPDATE index_demo
   SET state = 'MA'
   WHERE person_id LIKE '%5';

   UPDATE index_demo
   SET state = 'MN'
   WHERE person_id LIKE '%6';

   UPDATE index_demo
   SET state = 'VA'
   WHERE person_id LIKE '%7';

   UPDATE index_demo
   SET state = 'NC'
   WHERE person_id LIKE '%8';

   UPDATE index_demo
   SET state = 'MI'
   WHERE person_id like '%9';

   COMMIT;
END;
/

I use them in demos of bitmap indexes in part to demonstrate why gender is not the ideal usage. Some people may want to use it to construct test data of their own.

-- 
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Fri Jul 25 2008 - 15:37:24 CDT

Original text of this message