Using reverse key indexes to solve buffer busy wait problems

articles: 

Buffer busy wait and related events can cripple performance of concurrent inserts. Bad in a single instance database, far worse in a RAC (think "gc buffer busy"). Often the problem is because of a primary key populated from a sequence. Reversing the index can fix this problem.

Contention for index blocks when inserting grows can cause an application to hang up completely. This is because with a b-tree index on a monotonically increasing key, even though there will never be row lock all the inserted keys are going onto the same block at the edge of the index. A reverse key index will fix this. If you index, for example, 19, 20 and 21 as themselves, all three keys will probably be in the same block of the index. Instead, you would index them as 91, 02, and 12. So consecutive values will not be adjacent in the index: they will be distributed across the whole width of the index. You could do this programmatically, but Oracle provides the reverse key index for exactly this purpose. Here's an example:

connect / as sysdba
drop user jw cascade;
grant dba to jw identified by jw;
conn jw/jw

create table t1 (c1 number);
create index normal_i on t1(c1);
create table t2 (c1 number);
create index reverse_i on t2(c1) reverse;

create sequence s1 cache 100000 noorder;

create or replace procedure ins_normal(n number) as begin
for i in 1..n loop
insert into t1 values(s1.nextval);
end loop;
commit;
end;
/
create or replace procedure ins_reverse(n number) as begin
for i in 1..n loop
insert into t2 values(s1.nextval);
end loop;
commit;
end;
/

The schema JW now has two tables, one indexed with a normal b-tree and the other with a reverse key b-tree, and a sequence to generate the keys. And two procedures to insert some rows.
Now set up a concurrency test, using Windows shell scripts:
copy con i.sql
exec &1
exit
^Z

copy con concurrent_inserts.bat
sqlplus jw/jw @i.sql %1
^Z

for /l %i in (1,1,100) do start /b concurrent_inserts.bat ins_normal(10000)
for /l %i in (1,1,100) do start /b concurrent_inserts.bat ins_reverse(10000)

The SQL*Plus script I.SQL does nothing more than execute a procedure. The batch file CONCURRENT_INSERTS.BAT will launch SQL*Plus, calling the script I.SQL with a command line argument that will pass the name of the procedure.
To run the test, the FOR loops call the batch file concurrently in a hundred background sessions, performing ten thousand inserts each.
What is the result for buffer busy wait? After running that test? Here it is:
orclz>
orclz> select object_name,value from v$segment_statistics
  2  where owner='JW' and object_type='INDEX' and statistic_name='buffer busy waits';

OBJECT_NAME                         VALUE
------------------------------ ----------
NORMAL_I                           161347
REVERSE_I                            8983

orclz>
The reverse key index has reduced the buffer busy waits by around 95%. Impressed? I hope you are. This is even more significant in a RAC environment, where buffer busy wait is globalized.
I am not saying that all indexes should be reversed. You do need to understand your data and how it is being accessed. For example, a non-equality predicate on the key cannot use the index. But when would you use a non-equality predicate on a primary key? Probably, never. It is hard to find a reason for not reversing all your monotonically increasing keys.

----
Addition - by popular demand: some timings

I enabled trace for the test, these are the results (insert into the normal index first, then the reversed index):

********************************************************************************

SQL ID: 1bmscqsn9h63b Plan Hash: 3884345238

INSERT INTO T1 
VALUES
(S1.NEXTVAL)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      100      0.00       0.00          0          0          0           0
Execute 1000000    134.68    5421.65       1208   16471886    5103248     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   1000100    134.68    5421.66       1208   16471886    5103248     1000000

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 128     (recursive depth: 1)
Number of plan statistics captured: 100

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  (cr=62 pr=0 pw=0 time=14931 us)
         1          1          1   SEQUENCE  S1 (cr=0 pr=0 pw=0 time=38 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                      316        0.31          4.67
  db file sequential read                      1264        0.37          3.22
  buffer busy waits                          513840        3.20       2198.53
  latch free                                   2582        0.03          1.63
  library cache: mutex X                       2035        0.09         25.87
  enq: TX - index contention                 182675        0.13        900.54
  enq: SQ - contention                          128        0.00          0.43
  buffer deadlock                            105187        0.11       1163.93
  latch: cache buffers chains                  3039        0.01          0.79
  enq: TX - allocate ITL entry                  101        0.04          0.42
  latch: enqueue hash chains                 133585        0.09        922.55
  resmgr:cpu quantum                          57296        0.00          7.89
  cursor: pin S                                 136        0.06          1.20
  read by other session                         270        0.37          4.45
  log file switch completion                     65        0.11          2.96
  latch: undo global data                        53        0.00          0.00
  enq: TX - contention                           11        0.00          0.02
  log file switch (checkpoint incomplete)        49        3.18         35.50
  reliable message                              100        0.01          0.09
  control file sequential read                  560        0.01          0.27
  Data file init write                           28        0.00          0.02
  db file single write                           28        0.00          0.00
  control file parallel write                    84        0.00          0.01
  rdbms ipc reply                                28        0.08          0.29
  undo segment extension                          1        0.02          0.02
  enq: US - contention                           14        0.00          0.02
  enq: HW - contention                           11        0.00          0.00
  enq: FB - contention                            3        0.00          0.00
  latch: redo allocation                          1        0.00          0.00
********************************************************************************
********************************************************************************

SQL ID: b00stmqya8601 Plan Hash: 3884345238

INSERT INTO T2 
VALUES
(S1.NEXTVAL)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      100      0.00       0.00          0          0          0           0
Execute 1000000     61.89     948.89       2572     302421    4145886     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   1000100     61.89     948.89       2572     302421    4145886     1000000

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 128     (recursive depth: 1)
Number of plan statistics captured: 100

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  LOAD TABLE CONVENTIONAL  (cr=1 pr=1 pw=0 time=312356 us)
         1          1          1   SEQUENCE  S1 (cr=0 pr=0 pw=0 time=1120 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                      304        0.25         15.28
  db file sequential read                      2638        0.43         12.93
  read by other session                         318        0.46          8.26
  latch free                                    504        0.12          7.84
  resmgr:cpu quantum                          16738        0.21        196.22
  enq: SQ - contention                          112        0.07          3.03
  library cache: mutex X                       2892        0.16         73.54
  buffer busy waits                           13740        4.97        278.00
  latch: redo allocation                         48        0.07          0.48
  reliable message                              206        0.06          0.40
  cursor: pin S                                 289        0.07          2.90
  log file switch completion                    154        0.25         12.49
  latch: cache buffers chains                    21        0.07          0.24
  enq: US - contention                          556        4.51         95.12
  control file sequential read                  660        0.06          1.63
  Data file init write                           36        0.12          0.22
  db file single write                           33        0.00          0.01
  control file parallel write                    99        0.00          0.03
  rdbms ipc reply                                33        0.71          1.56
  enq: HW - contention                          324        4.44         40.96
  enq: CF - contention                            6        0.08          0.21
  enq: TX - index contention                    974        0.66         46.81
  buffer deadlock                               538        0.11         13.31
  latch: object queue header operation            4        0.02          0.08
  log file switch (checkpoint incomplete)        50        1.43         44.59
  enq: TX - contention                           10        0.01          0.05
  latch: enqueue hash chains                    451        0.07          1.18
  enq: TX - allocate ITL entry                   60        0.20          2.29
  undo segment extension                        210        1.00         14.99
  log buffer space                              115        0.24          6.51
  latch: redo copy                                7        0.06          0.29
  enq: FB - contention                            3        0.01          0.01
  latch: row cache objects                        4        0.00          0.00
  latch: undo global data                         2        0.00          0.00
  latch: checkpoint queue latch                   1        0.00          0.00
********************************************************************************
You can't argue with that, can you? I make it a 571% improved elapsed time. Mostly on buffer cache wait events.

(All tests done on DB release 12.1.0.1)
--
John Watson
Oracle Certified Master DBA
http://skillbuiders.com

Comments

Well done John. I would love to see some timings attached.

Is it possible to also mitigate the problem by manipulating INITRANS?

Thanks for the feedback, Ross, and thank you for kicking me to finish the demo properly. I've added trace information for the tests, pretty spectacular.
I haven't added the initrans test yet (anyone care to volunteer? No, I didn't think so).

Hope someone finds it helpful.

I don't know what's more alarming, the results or your prowess with DOS scripts!

Dear JW!

Is there any different progression between Normal B-Tree Index and Reverse Key Index?

In my mind, I think, the main one is:
- NX (Normal Index) using concentrated physical block instead continously during DML.
- RX (Reverse key Index) using same block instead of concentrated.

Correct me if I'm wrong!

Thank you!

Sorry, I do not understand what you mean by these terms. But reverse key indexes is a pretty straightforward topic, I'm sure if you run a few experiments you can work out anything that isn't clear. That is what I always do.