After more than 20 years of working with Oracle databases, I have recently found myself using SQL Server for the very first time. Until now, I have been a passive observer in the My-Database-Is-Better-Than-Yours wars, so it’s a pleasant change to be able to finally contribute.
I’m pleased to report that – as a software developer – the skills map pretty well between the databases.
(with apologies to Robert Ludlum and Eric Van Lustbader)
Oracle performance tuning is an excellent source of myths. The very best ones have a group of adherents who continue to support the myth even when presented with counter-examples. Who’s heard of these?
- Joins are faster than sub-queries
- Sub-queries are faster than joins
- Full Table Scans are bad
Those ones have been around as long as I can remember. Probably the single greatest concentration of Oracle performance tuning myths centres on Bitmap Indexes. Are these familiar?
- Bitmap indexes are good for low-cardinality columns, whereas B-Tree indexes are good for high-cardinality columns.
- Bitmap indexes are slow to update.
- Bitmap indexes don't support concurrent updates.
This is first post of the four-part epic - The Bitmap Conspiracy - detailing the structure and behaviour of Bitmap Indexes. Later in the series we will cover the internal structure of Bitmap Indexes, how Oracle uses them, and finally we will expose some of the myths surrounding them. But before we get there let’s just get a clear understanding of what a Bitmap Index actually is.
I’ve been tuning Oracle database applications for a long time now. I started out recognising some simple patterns and applying template fixes (Got a full table scan? Use an index!) but such a collection of “Do this; don’t do that” anecdotes will only take you so far. If you are curious (I was), you can uncover the reasons why one method is faster than another; i.e. what is the computer doing to make slow code so slow. I found that a good understanding of the internals meant that you didn’t always need to know how to tune a specific example because you could work it out for yourself.
In a database application, these investigations frequently lead to data structures; how does the database store its information and how does it retrieve it? Good information on the internals of Bitmap Indexes is hard to piece together, so in Part 2 of this Bitmap Indexing epic we will look more closely at the internals of Bitmap indexes.
This is Part 3 of The Bitmap Conspiracy, a four part epic on Bitmap Indexes.
In Part 1 we touched briefly on how Oracle can use Bitmap Indexes to resolve queries by translating equality and range predicates into bitmap retrievals. Now that we know more about how they are stored (see Part 2), let’s look closer at some of the operations that Oracle uses to access Bitmap Indexes and manipulate bitmaps.
This is one of those myths that is probably best demonstrated by example. First, let’s create some test data: 10 million rows with the following columns:
- COL1 – Alternating values of 1 and 0 – not indexed
- COL2 – Identical to COL1 – bitmap indexed
- PAYLOAD – 100 bytes of X’s to make the rows a realistic width – not indexed
COL1 and COL2 are identical low cardinality columns: there are only two valid values.
What I love about writing SQL Tuning articles is that I very rarely end up publishing the findings I set out to achieve. With this one, I set out to demonstrate the advantages of PARALLEL DML, didn't find what I thought I would, and ended up testing 8 different techniques to find out how they differed. And guess what? I still didn't get the results I expected. Hey, at least I learned something.
As an ETL designer, I hate updates. They are just plain nasty. I spend an inordinate proportion of design time of an ETL system worrying about the relative proportion of rows inserted vs updated. I worry about how ETL tools apply updates (did you know DataStage applys updates singly, but batches inserts in arrays?), how I might cluster rows together that are subject to updates, and what I might do if I just get too many updates to handle.
It would be fair to say I obsess about them. A little bit.
... then Goldilocks went into the bears' Data Centre and there were 3 Oracle databases. The first was a Data Warehouse. Goldilocks checked the AWR, but all the SQLs were to-o-o-o-o-o-o-o big; they all used full scans, hash joins, bitmap-index combining, partition pruning and parallelism and couldn't be tuned any more. So Goldilocks went to the second Oracle database. It was an OLTP system with hundreds of concurrent users. Goldilocks fired up SQL Tuning Advisor, but all the SQLs were to-o-o-o-o-o-o-o small; they used unique index scans and cluster-joins and couldn't be tuned any more. So Goldilocks went to the third Oracle database. It was an Operational Data Store with a rolling 3 month retention. Goldilocks found SQLs that were joining a million rows with Nested Loops joins, buffer cache hit ratios of 50%, and under-utilised disk. She smiled, opened up Tom Kyte's Expert Oracle eBook on her second monitor and got to work. This database was ju-u-u-u-u-u-u-u-st right ...
I was at the supermarket the other day waiting my turn at the checkout behind another guy. The checkout-chick (I'm sure there is a more PC term, I just don't know what it is) just finished scanning his groceries and he asked her to wait until his wife returned with a few last-minute items. I've done it before, so steam didn't start coming out of my ears - yet. Fifteen seconds later she ran up - sorry, sorry - and checked out the last few items, so I wasn't really inconvenienced.
As it turned out, their technique was efficient.
Aside from a nine month excursion to Sybase IQ, I've spent my entire career working with Oracle, so I don't profess too much expertise - indeed any! - about other RDBMS technologies. So in a weak attempt at self-education, I recently accepted an invitation to listen to a Teradata presentation directed at application developers.