Two of my Favorite Index Tuning Techniques

Kevin Meade's picture
articles: 

Recently my friends at work saw me doing things with indexes they didn't understand. They saw what looked like unneeded columns in some indexes, and funny expressions they didn't know could be done in an index. So I explained to them the benefits of INDEX ONLY ACCESS, and FUNCTION BASED INDEXES. They had not seen these two techniques before so I figured maybe other people would be interested too. Here is it, a short discussion on what are fast becoming my two favorite tuning techniques.

First let me say, it is easy to get carried away with these mechanisms. Once we see how something works, and we like it, we use it everywhere. But you must keep in mind two thoughts:

1) when you add, drop, or change an index; presumably to the benefit of a piece of code, you can sometimes negatively affect other pieces of code, so some testing is in order even when all you do is a simple index change to your system. This problem is less and less as Oracle's optimizer gets better and better, but it still bites me now and then and it will you too. If it does, don't be surprised, just deal with it.

2) It is easy to start creating lots and lots of indexes because you see the obviousness of how they will be used. But more indexes is not necessarily better indexes. For us Americans I realize this thought is a difficult one to grasp as we all know that in America, "MORE IS ALWAYS BETTER"; just keep an open mind. It is easy to end up with fifteen to twenty indexes on a table without much effort. If you find this happening, it may pay to back away from the keyboard for a minute and consider that not all accesses to a table need be 100% optimized. You can often combine several indexes together that share some number of leading columns and even drop indexes in situations where they really don't contribute. Yes, in the former (the first) situation you will have some indexed accesses looking at more rows than normal and thus be doing a little extra filtering, and in the later no indexed access at all. But an index should do one of two things:
a) it should support a key structure, or
b) it should significantly enhance an otherwise expensive operation. Expensive is certainly relative, just consider that shaving 2 seconds off a 2.1 second step is neat but meaningless if you only do the step a few hundred times a day and you don't need immediate response. Besides, having less indexes means there is a much better chance that the indexes you are using are already cached. Just don't go overboard eh. So... lets get on with the article.

INDEX ONLY ACCESS is the process of getting all the data you need to satisfy a query (or more likely a specific portion of a query) without having to actually visit the table. This is possible because Oracle is smart. One of its optimizations is to understand what columns are needed by a query, and where it is getting them at runtime. If it sees that an index it is accessing as part of a query plan contains all needed columns for a specific part of a query, it will not need a table access step and thus not include it in the plan. You can exploit this optimization trick by adding "extra" columns to the end of an index to increase the occurrence of this feat. Table accesses of this nature are seen in query plans as TABLE ACCESS BY ROWID. You know you have achieved your goal when the table access drops out of the plan. Here is an example that at least logically, conveys what this trick is about:

select dname,dept_id from dept where dname = :1;

We have all seen this query. We are accessing a reference table to decode a name to a surrogate key value. Very common. So, what indexing would be best for this query? A good DBA will have defined on the dept table the following indexes and their associated constraints:

create unique index dept_pk on dept (dept_id);
create unique index dept_uk1 on dept (dname);

alter table add constraint dept_pk primary key (dept_id);
alter table add constraint dept_uk1 unique (dname);

I might suggest changing indexes to the following instead:

create unique index dept_pk on dept (dept_id);
create index dept_i1 on dept (dname,dept_id);

alter table add constraint dept_pk primary key (dept_id);
alter table add constraint dept_uk1 unique (dname);

Heck, while we are here, why not recognize the opposite operation is just as common, that of translating a surrogate key back into its original values, and suggest these indexes: (you will notice that our indexes no longer match their respective constraints).

select dname from dept where dept_id = :1;

create index dept_i1 on dept (dept_id,dname);
create index dept_i2 on dept (dname,dept_id);

alter table add constraint dept_pk primary key (dept_id);
alter table add constraint dept_uk1 unique (dname);

At this point some people might be a little offset, so let me discuss two things here.

First, in today's Oracle, constraint enforcement is smarter than when it was originally introduced years ago. There was a time when constraints were supported by indexes that exactly mimicked their columns. Even today people are most comfortable with constraints being covered by indexes with the same columns because it is a simple and natural way of looking at the index/constraint co-operative association. Many people even think of them as one in the same. But they are not. Constraints are for today's Oracle different animals from indexes and a constraint can and will use any index that offers up the necessary leading columns. Thus it is no longer necessary to have indexes that exactly match constraints column for column. We can add additional columns to the end of our indexes and know that Oracle will pick an existing index to support the constraint when the constraint is created. You can test this yourself by creating these indexes and then creating the constraints and seeing that Oracle did not create new indexes just for the constraints. You can check this by looking at various dictionary objects for indexes and constraints (look for the INDEX_OWNER,INDEX_NAME columns in the user_constraints dictionary view and ask yourself why are these here?).

Second and more to the point of our discussion, once we add the second column to each of these indexes, our query if it uses the index to find its rows, will have accessed all columns necessary to answer the query when it accesses the index. Oracle knows this. Thus it will not include a TABLE ACCESS BY ROWID step in its plan because it has seen all it needs to see in order to give you an answer. This works on a table by table basis in the query so you don't have to have all data found in indexes for all tables in order to get some benefit.

So how much can you get form this. It really depends on how wide the table is and how short the index is. Its mostly a matter of data caching and disk i/o. In a good situation you will see queries run 2 times faster. I have seen 10 times faster in cases where tables were really wide and occassionaly hit home runs with this tuning trick as we all do from time to time.

FUNCTION BASED INDEXES are just what the name says, indexes built using functions. A newer enhancement. Consider the old problem:

You want to search on a name column, but the data in the column has a combination of upper and lower case. What do you do? In the old days you would:

1) create a second column called UPPER_NAME -- alter table emp add upper_name varchar2(30);
2) load it using upper(name) -- set upper_name = upper(name);
3) create an index on the new column -- create index emp_i9 on emp (upper_name);
4) change your where clauses everywhere to reference the new column -- where upper_name like 'SMITH%'

Today however you can do the following instead:

create index emp_i9 on emp (upper(name));
... where upper(name) like 'SMITH%';

Cool, we get to skip adding extra columns to our table that would otherwise have made it larger, and we avoid having to update the table possibly causing row splits or migrations in the process. Otherwise, the effect is the same. We have a way to get at the function modified data using an index. Its this benefit that makes this technique really really cool, that you can index expressions on the data. The trick being that your where clause pretty much matches the expression in the index. This means you can enable indexed access to data that previously you could not. Consider questions like these:

1) how do I find rows where values are null
2) how do I find rows based on calculations
3) how do I convert a range scan to an exact match query

1) select * from emp where release_date is null ==>

create index emp_i9 on emp (decode(release_date,null,1,null));
select * from emp where decode(release_date,null,1,null) = 1;

2) select * from line_item where nvl(total,0) - nvl(discount,0) - nvl(mgr_reduction,0) = 0 ==>

create index line_item_i9 on line_item (nvl(total,0) - nvl(discount,0) - nvl(mgr_reduction,0));

3) select * from line_item where nvl(total,0) - nvl(discount,0) - nvl(mgr_reduction,0) > 0 ==>

create index line_item_i9 on line_item (decode(sign(nvl(total,0) - nvl(discount,0) - nvl(mgr_reduction,0)),1,1,null)));
select * from line_item where decode(sign(nvl(total,0) - nvl(discount,0) - nvl(mgr_reduction,0)),1,1,null)) = 1;

One caution I would offer is that function based indexes tend to be special purpose in nature. Though they don't require anything different than we were doing in the olds days (indeed they require less work), they should be used rarely, just like in the old days, for they look great on paper, but in reality, few people would know to write the necessary coded expressions to match the index during normal code construction. Thus for the most part, you should use function based indexes to support specific pieces of code that need them and not try to use them as generic performance enhancers. Also, you must document the snot out of it, because who would know to look for one of these indexes unless you tell them in the code you just wrote or enhanced, that you expect the code to use the index. Still, in numerous situations, function based indexes yield stellar enhancements in performance.

Best yet, nothing says you can't use the two techniques together. Check this index out:

create   index "KEVIN"."KEVIN_I3" on "CLAIM_DATA"
(
   decode(nvl("RESERVE_AMOUNT_1",0)+nvl("RESERVE_AMOUNT_2",0)+nvl("RESERVE_AMOUNT_3",0),0,null,1)
 , "ROW_EFFCTV_DATE"
 , "ROW_EXPRTN_DATE"
 , "CLM_ID"
)
  compute statistics
/

I use this index to find out if a non-zero reserve was created during a specific time period, and have included the CLM_ID column as a tag along data column. The result is, an index that only contains rows that could satisfy the query, that when used goes straight to the rows we want, and that does not bother accessing the table. Originally the query that prompted this index was doing a full table scan on a very large table. Now it does a quick index lookup. This is one of those great stories of "hey it runs 40 times faster". Well, you get the idea.

Have fun with it. Kevin

Comments

I saw a demonstration of this in an Oracle class, and the effects are dramatic when you include functions in indexing.

In some cases you can use Index Organized Tables. Especially when the table has only a few columns.