Range Keyed Tables

A more insidious Range Scan problem occurs when a table contains two columns that represent lower and upper bounds of a range. Consider a table of employee grades defined by lower and upper salary bounds.

CREATE TABLE position (
    min_salary NUMBER
,   max_salary NUMBER
,   position_description VARCHAR2(60)
,   PRIMARY KEY(min_salary, max_salary)
);


MIN_SALARY MAX_SALARY POSITION_DESCRIPTION
---------- ---------- --------------------
         0      20000 Clerk
     20001      28000 Team Leader
     28001      33000 Supervisor
     33001      45000 Area Manager
     45001      75000 Department Manager
     75000     999999 General Manager


SELECT position_description
FROM   position
WHERE  min_salary <= 34500
AND    max_salary >= 34500


SELECT STATEMENT
    TABLE ACCESS BY ROWID POSITION
        UNIQUE INDEX RANGE SCAN SYS_1234

The SQL above selects the position description for employees earning $34,500. Even though we expect only one row returned, the primary key index on min_salary and max_salary will not perform a unique scan because we are not using the key with equals only joins. In fact, since the first column of the index - min_salary - is used with a range predicate - the rest of the index (max_salary) will not take part in the scan at all. This means that the index scan will return all four rows with min_salary <= 34500 before using max_salary to filter out the three non-matching rows. Whilst this example is trivial, it becomes very non-trivial when the table contains thousands of ranges.

For a single table select such as this, the solution is simple. Create an index with the two columns swapped around. ie. max_salary, min_salary. This will ensure that the first matching row in the index is the row that we want. Add the predicate AND ROWNUM = 1 to the query. Oracle will range scan the index, find that the first row encounted is a match for min_salary as well, and then abort the SQL because of the ROWNUM = 1 predicate. The range scan is not given the opportunity to complete.

Note that this technique is useless if the query is a table join where we want to use the range keyed table as a loopup for rows in another table.

Consider the SQL:

SELECT employee_name, sal, position_description
FROM   employee e, position p
WHERE  e.sal BETWEEN p.min_salary AND p.max_salary

If the lookup of the range table is part of a table join, then there is simply no good way to perform the join using an index without a horrendous range scan. Say position is keyed on (max_salary, min_salary). For each row in employee, the index on position is range scanned on max_salary and (on average) half the rows are read from that index before being filtered down to one.

If the range keyed table contains thousands of rows, the cost can be catastrophic.

If the range-keyed table has been designed so that the ranges never overlap, then we can alter the SQL slightly to short-circuit the range scan. In the above example, if we assume that the ranges never overlap then the following SQL is functionally equivalent:

SELECT employee_name, sal, position_description
FROM   employee e, position p
WHERE  p.min_salary = (
       SELECT max(min_salary)
       FROM   position
       WHERE  min_salary <= e.sal
AND    e.sal BETWEEN p.min_salary AND p.max_salary

This SQL exploits a feature of the Cost Based Optimizer, whereby the MIN or MAX of an Index Range Scan can be returned without scanning the entire range - Oracle just goes to either the start or end of the range and returns the first or last row. Use Explain Plan to determine whether Oracle is using the FIRST ROW feature:

SELECT STATEMENT
    TABLE ACCESS BY ROWID POSITION
        SORT (AGGREGATE)
            FIRST ROW
                UNIQUE INDEX RANGE SCAN (MIN/MAX) SYS_1234

Since this is a feature of the CBO, it is not available under the Rule Based Optimizer (RBO). If the Optimizer Mode/Goal is set to RULE, or the SQL contains the /*+ RULE*/ hint, or the Index has not been analyzed, the FIRST ROW identifier will not appear in the Explain Plan and there will be no performance improvement. Analyze the index and ensure that CBO is being used, then try again. It may be necessary to add a PUSH_SUBQ hint to promote the eveluation of the sub-query ahead of the BETWEEN join.

If you cannot guarantee non-overlapping ranges, the best option is to perform a Sort-Merge join between the tables, and ignore the index on the Range Keyed Table - you may find that Oracle chooses this method anyway if the table is analyzed. Unlike a Hash Join, a Sort-Merge join is able to join on range predicates (>[=], <[=], BETWEEN). To find out if Oracle is using a Merge join, run the SQL through Explain Plan and look for a MERGE step covering the two tables. If Oracle is performing a Nested Loops join, add a USE_MERGE hint to the SQL.

Note that this technique involves a Full Table Scan on the range keyed table. This makes it efficient if we are joining the range keyed table to many rows in the other table (say >50). If the SQL is retrieving only a few rows in the main table, then the full table scan on the range keyed table might be too expensive. In this situation, break the join into two separate SQLs: create a cursor selecting from the main table, and a nested lookup querying the range key table using the ROWNUM=1 technique desribed earlier.