RE: Bitmap index costing - how to influence

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sat, 28 Mar 2009 03:34:33 -0400
Message-ID: <12404A84B738441798A128E9EF6E1A22_at_rsiz.com>



Well you're both right. The question was how to do it, but please let me underscore WHY what Greg wrote is so important. Using a single bit map index to control returning results from a single table is almost never even a win, let alone a big win. When it is a big win, it could probably have been done more effecitively with either partitioning or histograms. (Before you send a flood of counter examples, please notice the words almost and probably, and read on.)

The whole design idea of bit map indexes is to facilitate the dramatic subsetting of returned rows by taking the intersection of several of them. This facilitates dramatic pruning when there are several non-correlated attributes in play as predicates so that no "skew" relationship can be exploited. Then you scan against the several bit map indexes, which should be very dense (ie. many row references per block) compared to full rows in the tables, so when you get around to yanking back the referenced rows you have many fewer to read. (Has anyone checked whether Oracle sorts the bit map results by row block address before it started retrieving the select rows? That was part of the design discussion at Oracle VLDB in the early 1990's. If they don't, they probably should, to minimize the problem Greg talks about. (It can be worse than he mentions if it is a multi-block row and you have column references to be retrieved from more than one block, but for short rows it can be much, much better. The presumption, before it had been implemented, when we were talking about it in the early 1990s was that after merging the bit maps (we didn't even think the single bit map case was worth talking about as to optimization), the number of blocks to read and the effort to make read consistent cache block copies would be minimized by sorting by rba before starting to yank back the blocks. Since it was a long time later that is was implemented, I never remembered to test whether they did that. Of course even if they did not, they may have measured which was better in the "average" case along the route to implementation and the results may have indicated sorting was a waste. Speculate as smart as you can when you can't test something, but when you can measure (and logically validate that the test is testing what you think it is testing), the measurements should prevail. Of course the Oracle VLDB folks wanted to optimize exploitation of bit map indexes for really big, relatively static stuff. SO that means a bigger chance that in a huge set of rows (but a dramatic subset of the total so it was worthwhile to use the merging of bit maps to get there) there is a bigger chance that the read consistent cached block containing at least two of the rows you need will age out if the references are not sorted before you start retrieval.

Now I have not added value to the thread on how to smack the optimizer up side the head to get it to choose bit maps, so I'm a bit (pun intended) off track for the thread, but I still hope this is helpful.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Christo Kutrovsky
Sent: Saturday, March 28, 2009 2:18 AM
To: Greg Rahn
Cc: oracle-l
Subject: Re: Bitmap index costing - how to influence

Hi Greg,

I appreciate the feedback, but let's not deviate from the subject.

The question is not whether to use bitmap indexes, but how to influence the CBO calculations to favor the use of bitmap indexes.

There was no way for you to know this, but the query in question is using a range predicate that can vary greatly:

SELECT * from some_table_709MB
WHERE julian_date >= 2454892 AND julian_date <= 2454922

So again, the question is how to favor the use of bitmap indexes by the CBO without affecting too much b-tree or other access paths.

On Sat, Mar 28, 2009 at 1:16 AM, Greg Rahn <greg_at_structureddata.org> wrote:
> In this case I would suggest looking to see if partitioning can be
> used. I think it may be a better fit. Here is why:
> On an absolute scale, 800,000 rows returned could mean 800K single
> block reads which is 800K I/Os (granted, some rows may be in the same
> block). Compare that to 800 1MB multi-block reads, which is more than
> the size of the table (709MB). This is why the costing model favors
> the FTS by quite a bit - because the entire table can be read with
> fewer number of I/Os.
>
> What if you try this:
> Make a copy of the 10M row table that contains only the 800K rows you
> want returned and compare doing a FTS on 800K to getting the 800K out
> of 10M using the bitmap index. Clear your buffer cache before each.
> Make sure your multi-block I/O is 1MB. I suspect the FTS will be
> faster as it will have, at most, 8000 blocks and assuming a MBRC of
> 128 and block size of 8k, 8000/128=62.5 so now you are talking about
> reading, at most, 63 or so 1MB I/Os vs the number of single block I/Os
> it takes to fetch the 8000 blocks via the bitmap index, 1 block at a
> time.
>
> Understandably, if you have a perfect scenario, the bitmap index plan
> could be faster. Those conditions being that the index and table
> blocks are in memory. On the extreme end, if the table is sorted by
> the bitmap indexed column, then you would have extreme row locality,
> meaning the first row in the block would could cause a physical I/O,
> but each subsequent row in the block would have a logical I/O.
>
> This might be a useful resource:
> http://structureddata.org/2008/05/29/using-bitmap-indexes-effectively/
>
> On Fri, Mar 27, 2009 at 1:40 PM, Christo Kutrovsky
> <kutrovsky.oracle_at_gmail.com> wrote:
>> I have a query on a table with 10M rows (709MB), and the predicates
>> are correctly estimated to return 800 000 rows. As a result with the
>> famous 80/20 split, my bitmap index access path cost is ~170 000. The
>> full table scan cost is ~20 000. Obviously Oracle choose FTS.
>>
>> When executed with a hint, the query touches about 8000 block from the
>> table, and needless to say, is significantly faster.
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
>

-- 
Christo Kutrovsky
Senior DBA
The Pythian Group - www.pythian.com
I blog at http://www.pythian.com/blogs/
--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l
Received on Sat Mar 28 2009 - 02:34:33 CDT

Original text of this message