Part 1 - The Bitmap Identity
This is first post of the four-part epic - The Bitmap Conspiracy - detailing the structure and behaviour of Bitmap Indexes. Later in the series we will cover the internal structure of Bitmap Indexes, how Oracle uses them, and finally we will expose some of the myths surrounding them. But before we get there let’s just get a clear understanding of what a Bitmap Index actually is.
- What is a Bitmap Index?
- What is a Bitmap?
- Combining Bitmaps
- What is a Bitmap Index
- Any Questions?
- Appendix A – AND and OR Binary Logic
We all understand that a bit is the smallest unit of information and is typically expressed as 1 or 0. We often attach meaning to these values such as TRUE and FALSE, or YES and NO because this is the only way we can make sense of a single bit without context. Once we add context to a bit: we can make it mean just about anything: Right or Wrong, Positive or Negative, Alive or Dead, Play-this-song-repeatedly or Do-not-play-this-song-repeatedly.
A bitmap consist of two things:
- A series of one or more bits
- A context that defines what those bits are telling us
For example, consider the following bitmap and context:
- Bitmap: 111001001
- Context: People in the Brady Bunch aged 13 or greater
There are 9 bits corresponding to the 9 Bradys (Mike, Carol, Marcia, Jan, Cindy, Greg, Peter, Bobby, Alice). We can see that bits 1, 2, 3, 6 and 9 are “set”; this tells us that Mike, Carol, Marcia, Greg and Alice are all aged 13 or greater. Importantly, it also tells us that Jan, Cindy, Peter and Bobby are not aged 13 or greater.
See what we did there? Because it was pretty cool. We took two lists of names totalling 376 bits (1 letter = 1 byte = 8 bits) and stored the two lists it in a single bitmap totalling just 9 bits.
But hang on there (I hear you say), those 9 bits just store the Bitmap, by the time you store the Context (those 9 names), you haven’t saved any space at all! Well (I say back to you), that’s very observant of you. You are dead right: one bitmap on its own like this is actually an inefficient way to store this sort of information. In fact, we are quite close to dispelling the first of those Bitmap Index Myths: that a (single) Bitmap Index is good for low-cardinality columns … but more on that later.
Although our single 9-bit bitmap is made inefficient by the 376 bits of name data we store to make sense of it, we can make it more efficient by reusing the list of names when we capture more bitmaps that describe the Bradys.
- 100001110 – Males
- 110000001 – Adults
- 100000000 – Architects
- 011110000 – Blondes
- 000100000 – Tattletales
We now start to see that if each of these bitmaps is stored as a list of names (names of males and non-males, names of adults and non-adults, architects and non-architects, etc) then it is going to take up a lot more space than the bitmaps.
So is this why we use Bitmap Indexes: to take up less space? Hardly! We haven’t got to the good bit yet.
The really cool thing about bitmaps is that they can be combined easily using Binary Logic. Binary Logic uses the operators “AND” and “OR”. We sometimes use the symbols & (and) and | (or), although these are not supported by SQL.
See Appendix A for a detailed explanation of AND and OR binary logic operations.
The best thing about Binary Logic is that computers can do it really easily. In fact - although it may be hard to believe – every single operation performed by a computer is performed using binary logic. Addition, multiplication, playing Minesweeper, watching cat videos: they are all done entirely with binary logic.
Computers perform binary logic really fast, so that means bitmaps can be combined to answer complex questions really fast.
Before we describe a Bitmap Index, let’s set up an example (in Oracle) based on the bitmaps we used above.
# NAME GENDER HAIR AGEGRP DESCRIPTION BIRTHORDER - ------ ------ ------ ------ ----------- ---------- 1 MIKE MALE BROWN ADULT ARCHITECT 2 CAROL FEMALE BLONDE ADULT MOM 3 MARCIA FEMALE BLONDE TEEN POPULAR ELDEST 4 JAN FEMALE BLONDE CHILD WHINER MIDDLE 5 CINDY FEMALE BLONDE CHILD TATTLETALE YOUNGEST 6 GREG MALE BROWN TEEN JOCK ELDEST 7 PETER MALE BROWN CHILD WANNABE MIDDLE 8 BOBBY MALE BROWN CHILD DREAMER YOUNGEST 9 ALICE FEMALE BROWN ADULT HOUSEKEEPER
If we create a Bitmap Index on a column (say AGEGRP), then Oracle generates a bitmap for each distinct value of that column, where the position of bits in the bitmap represent rows in the table.
M C A C P B A M A R I G E O L I R C J N R T B I K O I A D E E B C E L A N Y G R Y E - - - - - - - - - ADULT 1 1 0 0 0 0 0 0 1 TEEN 0 0 1 0 0 1 0 0 0 CHILD 0 0 0 1 1 0 1 1 0
We can see that the bitmap for ADULT is 110000001, meaning that the first, second and ninth rows in the table contain AGEGRP = ‘ADULT’. There are 3 bitmaps for the AGEGRP column because there are 3 different possible values. If we created an index on DESCRIPTION then we would get 9 bitmaps because there are 9 different values.
When we run a SQL that includes predicates on the bitmap indexed columns, Oracle retrieves the bitmaps, performs any Binary Logic required to satisfy AND / OR conditions in the SQL and then translates the final bitmap into Oracle ROWIDs so that the rows can be looked up in the table.
SELECT NAME FROM BRADYS WHERE AGEGRP = ‘CHILD’ AND HAIR = ‘BLONDE’
Step 1: Retrieve the ‘CHILD’ and ‘BLONDE’ bitmaps
CHILD 000110110 BLONDE 011110000
Step 2: Combine these bitmaps using the AND operator
CHILD 000110110 & BLONDE 011110000 ------ --------- STEP2 000110000
Step 3: Convert the result of Step 2 into ROWIDs and lookup the rows in the table.
The bitmap in Step 2 has bits 4 and 5 set. In the BRADYS table, these rows correspond to JAN and CINDY.
SELECT NAME FROM BRADYS WHERE AGEGRP = ‘CHILD’ AND HAIR = ‘BLONDE’ NAME ----- JAN CINDY 2 rows selected.
If you are anything like me, you now have heaps of questions. When I first saw this type of description of Bitmap Indexes, I was worried about heaps of stuff. Let’s compare notes; here’s my list:
- Is there a bitmap for the NULL value? Ie. Can you scan for IS NULL predicates?
- Two or three distinct values mean two or three bitmaps? That sounds efficient. What is the upper limit?
- If you insert a new distinct value into a big table, won’t it take ages to build the new bitmap?
- How does Oracle convert the set bits of a bitmap into ROWIDs?
- I understand how regular indexes are physically stored in a B-Tree. How are Bitmap Indexes stored?
The Oracle manuals contain a very detailed description of the structure of B-Tree indexes, but prior to version 11g they have not been as forthcoming on the structure of Bitmap Indexes. The Oracle 11g Database Concepts manual contains a short but revealing section on the structure on Bitmap Indexes. Over the years I have pieced together a lot of information, much of which is confirmed by the 11g Database Concepts manual. Some of it is still just my unproven opinion, so in the next edition I will try to answer these questions with as much supporting evidence as I can find.
The binary AND (&) operator combines two bitmaps (say, A & B) and returns a third bitmap (C) that contains only those bits set in both A and B. eg. Say we wanted to find the Bradys that are both Males AND Adults.
M C A C P B A M A R I G E O L I R C J N R T B I K O I A D E E B C E L A N Y G R Y E - - - - - - - - - 1 0 0 0 0 1 1 1 0 – A (MALE) 1 1 0 0 0 0 0 0 1 – B (ADULT) ----------------- 1 0 0 0 0 0 0 0 0 – C = A & B
The first bit of C is set (1) because it is set in both A and B. The second bit of C is not set (0) because it is set in only one of A and B. The third bit of C is not set (0) because it is not set in either A or B, and so on.
Since only the first bit of C is set, this tells us that Mike is the only Adult Male.
The binary OR (|) operator combines two bitmaps (say, A | B) and returns a third bitmap (C) that contains bits set in either A or B. eg. Say we wanted to find the Bradys that are either Architects or Tattletales
M C A C P B A M A R I G E O L I R C J N R T B I K O I A D E E B C E L A N Y G R Y E - - - - - - - - - 1 0 0 0 0 0 0 0 0 – A (ARCHITECTS) 0 0 0 0 1 0 0 0 0 – B (TATTLETALES) ----------------- 1 0 0 0 1 0 0 0 0 – C = A | B
The first bit of C is set (1) because it is set in A, even though it is not set in B. The second bit of C is not set (0) because it is not set in either A or B, and so on.
Since bits 1 and 4 of C are set, this tells us that Mike and Cindy are in the group of Architects and Tattletales.