Re: indexing a column with only 2 or 3 values

From: Richard Foote <richard.foote_at_bigpond.nospam.com>
Date: Fri, 03 Aug 2007 14:05:09 GMT
Message-ID: <pCGsi.15616$4A1.14750@news-server.bigpond.net.au>

"Thomas Kellerer" <JUAXQOSZFGQQ_at_spammotel.com> wrote in message news:5hgnqqF3l1n69U1_at_mid.individual.net...
> Richard Foote wrote:

>> "Thomas Kellerer" <JUAXQOSZFGQQ_at_spammotel.com> wrote in message 
>> news:5hg5aiF3kia4nU1_at_mid.individual.net...
>>> ciapecki wrote:
>>>> Hi,
>>>>
>>>> Does indexing a very big table (about 5Mio records) on the columnA
>>>> which can hold only values Y,N,<NULL> make sense?
>>>>
>>> Yes, that's what bitmap indexes were made for.
>>>
>>
>> Hi Thomas
>>
>> A Bitmap index is of no use if both Y and N are roughly evenly 
>> distributed and you have no other predicate in the query.
>>
>> Returning approximately 2.5 millions rows through a bitmap index would be 
>> dramatically slower than a full table scan.
>>
>> A single bitmap index only would be useless in this scenario, even more 
>> so if the table is subjected to any transactional based DML load.
>>
>

> Hi Richard,

> thanks for the pointing this out.
> I wasn't aware of that, but it does sound reasonable.

> But after all the Concepts manual says:

> "If the number of distinct values of a column is less than 1% of the
> number of rows in the table, or if the values in a column are repeated
> more than 100 times, then the column is a candidate for a bitmap index."

> Actually a bit further down in the Concepts manual there is an example
> very similar to the OP's situation:

> "There are only three possible values for marital status and region, two
> possible values for gender, and four for income level. Therefore, it is
> appropriate to create bitmap indexes on these columns"
>

Hi Thomas

There are a number of classic myths associated with bitmap indexes and yes, Oracle is as guilty as anyone in propagating them. One is that bitmap indexes are only useful for low cardinality columns. The above definition is better than many I've read but it's still one of those rules of thumbs that is not entirely accurate as column values outside of the definition could possibly be candidates for a bitmap index.

The other classic myth is that a *single* bitmap index on a very low cardinality column (as in the OPs example) can be very efficiently utilised to retrieve the required number of rows. But if this means retrieving 50% or 33% or 25% etc of all rows in a huge table, then it not going to be very efficient at all when compared to the poor old full table scan.

Note in the above example, it mentions 4 different columns, not one column. These four columns when *combined* could possibly reduce the final result set to a small enough subset of required rows that would make retrieving them one at a time through the rowids a possibly attractive option. For example, there may not be that many single males that live in Canberra that have a really low income ...

At the end of the day, it comes back to the overall selectivity of predicates and can combinations of bitmaps when and/or/not together produce a small enough set of rowids to make it all worthwhile to read the bitmap blocks, perform the set logic and retrieve the resultant rowids one at a time when compared with other alternatives (such as the full table scan).

The answer is almost certainly a big no for a single bitmap index on a low cardinality column.

Cheers

Richard Received on Fri Aug 03 2007 - 09:05:09 CDT