RE: FW: Bitmap index not used when joining tables

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Mon, 2 Jun 2014 19:47:10 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D901DF8548_at_exmbx05.thus.corp>


Gopal,

I should have looked in my book rather than trying to remember where I'd got the 80/20 from - here's how it appears on p.188:

"According to K. Gopalakrishnan et al. (Oracle Wait Interface: A Practical Guide to Oracle Performance Diagnostics and Tuning, Osborne McGraw-Hill, 2004), the optimizer assumes that 80% of the target data is tightly packed, and 20% of the target data is widely scattered."

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of K Gopalakrishnan [kaygopal_at_gmail.com] Sent: 02 June 2014 20:25
To: chris.saxon_at_gmail.com
Cc: Oracle-l List
Subject: Re: FW: Bitmap index not used when joining tables

Chris,

(From my Oracle Wait Interface Book: Page 278...Bitmap index costing, changing the costing methods,etc)

http://www.freelists.org/post/oracle-l/BITMAP-index-cost-10053-trace,4

-Gopal

On Sun, Jun 1, 2014 at 4:32 AM, Chris Saxon <chris.saxon_at_gmail.com<mailto:chris.saxon_at_gmail.com>> wrote: Thanks for the explanation Jonathan.

Could you explain where the 20% value comes from please? Is this a default for bitmap indexes, or something you've inferred from the info posted?

On 31 May 2014 20:24, Mark W. Farnham <mwf_at_rsiz.com<mailto:mwf_at_rsiz.com>> wrote: hotel email bounce; trying again

From: Mark W. Farnham [mailto:mwf_at_rsiz.com<mailto:mwf_at_rsiz.com>] Sent: Saturday, May 31, 2014 3:14 PM
To: 'jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>'; 'oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>' Subject: RE: Bitmap index not used when joining tables

I do wonder about what your results might be for these two queries if you modified your query slightly as:

select
--+ gather_plan_statistics
 s.* from sales_fact s

       where s.rowid in (
       select
       --+ no_merge
          s2.rowid from sales_fact s2
          where s2.date_id in (
          select
          --+ no_merge
             d.date_id from date_d d
             where calendar_date between date’2013-12-31’ and date’2014-01-01’
          )
       );

The idea being that if the only column it needs in the unmerged subquery (rowid) can be sourced from the index, this should be less total work. Possibly this is also optimized if the subquery is ordered by rowid, but that’s not the curiosity at this point. Of course even if this works you would only want to form it up this way when you knew it was to your benefit (knowledge the CBO can only guess by way of the thumbrule quoted by JL, and some other thumbrule for a literal range.)

I just typed this in and didn’t test it, and working is release dependent, so you and I got lucky if this works via cut and paste. Once debugged I would sort of expect this to use the index in both cases to fetch back the rowid list.

mwf
From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jonathan Lewis Sent: Saturday, May 31, 2014 1:11 PM
To: oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org> Subject: RE: Bitmap index not used when joining tables

It's not surprising for the example you've given.

The basic principle is that the cost of a single table access via a bitmap index is likely to be different from the cost of access via the equivalent btree index because the bitmap index has no information about data clustering (i.e no clustering_factor) so it uses a guess. Sometimes this means the cost of the bitmap will be higher, sometimes lower, sometimes the same as for a btree. In your case the btree clustering factor will be very low (because of the order by in the CTAS) while the guess basically assumes that 20% of the data will be very widely scattered - in your case that probably means 400 blocks (20% of 2000 rows => 400 blocks).

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] on behalf of Chris Saxon [chris.saxon_at_gmail.com<mailto:chris.saxon_at_gmail.com>] Sent: 31 May 2014 16:21
To: oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org> Subject: Bitmap index not used when joining tables Hi,

I've been testing using bitmap indexes on 11.2.0.2 EE. When joining two tables on a column with a BTree index, the index is used in the execution plan. If this index is changed to a bitmap index, Oracle no longer uses the index when executing the query! It assigns a higher cost to using the bitmap index when joining, despite this being a cheaper approach (in terms of consistent gets).

To see this, I created a date dimension table with 515 days (rows) and a sales fact table with 1,000 rows for each day:

create table date_d (
  date_id integer not null primary key,
  calendar_date date not null unique
);

create table sales_fact (
  date_id integer not null
    references date_d (date_id),
  quantity number not null,
  total_value number not null
);

insert into date_d
  select rownum, date'2013-01-01'-1+rownum   from dual
  connect by level <= sysdate - date'2013-01-01';

insert into sales_fact
with rws as (select * from dual connect by level <= 1000)   select d.date_id, round(dbms_random.value(1, 20)), round(dbms_random.value(10, 100), 2)   from date_d d
  cross join rws
  order by d.date_id;

begin
  dbms_stats.gather_table_stats(user, 'sales_fact');   dbms_stats.gather_table_stats(user, 'date_d'); end;
/

If I create a BTree index on SALES_FACT.DATE_ID, then join from the date dim to the fact table, restricting to two days, Oracle uses the index on the fact table as I would expect (as we're fetching 2,000 of 515,000 rows):

create index safa_date_id on sales_fact (date_id);

set autotrace trace
select s.* from sales_fact s join date_d d on d.date_id = s.date_id
where calendar_date between date'2013-12-31' and date'2014-01-01';

set autotrace off

Execution Plan



Plan hash value: 2189554905<tel:2189554905>
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  3002 | 69046 |    21   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |  3002 | 69046 |    21   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | SAFA_DATE_ID |  1000 |       |     3   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID | SALES_FACT   |  1000 | 11000 |     6   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   4 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00', 'syyyy-mm-dd

              hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

   5 - access("D"."DATE_ID"="S"."DATE_ID")

Statistics


          1  recursive calls
          0  db block gets
        290  consistent gets
          6  physical reads
          0  redo size
      36195  bytes sent via SQL*Net to client
       1839  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2000  rows processed

However, if I drop the normal index and re-create it as a bitmap index the query above changes to a FTS on the SALES_FACT table:

set autotrace off
drop index safa_date_id;
create bitmap index safa_date_id on sales_fact (date_id);

set autotrace trace
select s.* from sales_fact s join date_d d on d.date_id = s.date_id
where calendar_date between date'2013-12-31' and date'2014-01-01';

Execution Plan



Plan hash value: 525754326
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  3002 | 69046 |   315   (2)| 00:00:04 |
|*  1 |  HASH JOIN                   |              |  3002 | 69046 |   315   (2)| 00:00:04 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |     2   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL          | SALES_FACT   |   515K|  5532K|   310   (1)| 00:00:04 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   1 - access("D"."DATE_ID"="S"."DATE_ID")    3 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00', 'syyyy-mm-dd

              hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


Statistics


          1  recursive calls
          0  db block gets
       1267  consistent gets
          0  physical reads
          0  redo size
      36195  bytes sent via SQL*Net to client
       1839  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2000  rows processed

If we hint the query to use the bitmap index, we can see it has a higher cost. The autotrace stats report significantly fewer consistent gets though:

Execution Plan



Plan hash value: 1520624055
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |  3002 | 69046 |   416   (0)| 00:00:05 |
|   1 |  NESTED LOOPS                 |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |  3002 | 69046 |   416   (0)| 00:00:05 |
|   3 |    TABLE ACCESS BY INDEX ROWID| DATE_D       |     3 |    36 |     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS_C0037151 |     3 |       |     2   (0)| 00:00:01 |
|   5 |    BITMAP CONVERSION TO ROWIDS|              |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE | SAFA_DATE_ID |       |       |            |          |
| 7 | TABLE ACCESS BY INDEX ROWID | SALES_FACT | 1000 | 11000 | 416 (0)| 00:00:05 |

Predicate Information (identified by operation id):


   4 - access("D"."CALENDAR_DATE">=TO_DATE(' 2013-12-31 00:00:00', 'syyyy-mm-dd

              hh24:mi:ss') AND "D"."CALENDAR_DATE"<=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

   6 - access("D"."DATE_ID"="S"."DATE_ID")

Statistics


          1  recursive calls
          0  db block gets
        152  consistent gets
          1  physical reads
          0  redo size
      36195  bytes sent via SQL*Net to client
       1839  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2000  rows processed

If you remove the join to the date dim and just use the date ids, Oracle uses the index as expected:

select * from sales_fact
where date_id between 365 and 366;

Execution Plan



Plan hash value: 2749560877
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |  3002 | 33022 |   221   (0)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID | SALES_FACT   |  3002 | 33022 |   221   (0)| 00:00:03 |
|   2 |   BITMAP CONVERSION TO ROWIDS|              |       |       |            |          |
|*  3 |    BITMAP INDEX RANGE SCAN   | SAFA_DATE_ID |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   3 - access("DATE_ID">=365 AND "DATE_ID"<=366)

Statistics


          1  recursive calls
          0  db block gets
        144  consistent gets
          0  physical reads
          0  redo size
      36195  bytes sent via SQL*Net to client
       1839  bytes received via SQL*Net from client
        135  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2000  rows processed

Looking at the 10053 trace, I can see this is because the bm index join cost is calculated as higher than the FTS of SALES_FACT:

NL Join
  Outer table: Card: 3.00 Cost: 3.00 Resp: 3.00 Degree: 1 Bytes: 12 Access path analysis for SALES_FACT
  Inner table: SALES_FACT Alias: S
  Access Path: TableScan
    NL Join: Cost: 929.82 Resp: 929.82 Degree: 1

      Cost_io: 920.00  Cost_cpu: 317629069
      Resp_io: 920.00  Resp_cpu: 317629069
  • trying bitmap/domain indexes ****** Access Path: index (AllEqJoinGuess) Index: SAFA_DATE_ID resc_io: 1.00 resc_cpu: 8171 ix_sel: 0.001942 ix_sel_with_filters: 0.001942 NL Join : Cost: 6.00 Resp: 6.00 Degree: 1 Cost_io: 6.00 Cost_cpu: 47359 Resp_io: 6.00 Resp_cpu: 47359 Bitmap nodes: Used SAFA_DATE_ID Cost = 6.001464, sel = 0.001942 Access path: Bitmap index - accepted Cost: 416.121488 Cost_io: 415.999594 Cost_cpu: 3943626.145192 Sel: 0.001942 Not Believed to be index-only
  • finished trying bitmap/domain indexes ******
  Best NL cost: 416.12
          resc: 416.12  resc_io: 416.00  resc_cpu: 3943626
          resp: 416.12  resp_io: 416.00  resc_cpu: 3943626

However, same section of the 10053 trace shows the following when using a BTree index on the SALES_FACT.DATE_ID column:

NL Join
  Outer table: Card: 3.00 Cost: 3.00 Resp: 3.00 Degree: 1 Bytes: 12 Access path analysis for SALES_FACT
  Inner table: SALES_FACT Alias: S
  Access Path: TableScan
    NL Join: Cost: 929.82 Resp: 929.82 Degree: 1

      Cost_io: 920.00  Cost_cpu: 317629069
      Resp_io: 920.00  Resp_cpu: 317629069
  Access Path: index (AllEqJoinGuess)

    Index: SAFA_DATE_ID
    resc_io: 6.00 resc_cpu: 433579
    ix_sel: 0.001942 ix_sel_with_filters: 0.001942     NL Join : Cost: 21.04 Resp: 21.04 Degree: 1
      Cost_io: 21.00  Cost_cpu: 1323580
      Resp_io: 21.00  Resp_cpu: 1323580

  Best NL cost: 21.04
          resc: 21.04  resc_io: 21.00  resc_cpu: 1323580
          resp: 21.04  resp_io: 21.00  resc_cpu: 1323580

Why does this happen? Is this a bug or expected behaviour?

Thanks,
Chris
www.sqlfail.com<http://www.sqlfail.com>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jun 02 2014 - 21:47:10 CEST

Original text of this message