Hypercharging Oracle Data Loading
Is there any limit to the speed of Oracle? With Oracle announcing a new record one million transactions per minute, many believe that there is nothing that Oracle cannot do.
However, what if we have a requirement for a system that must accept high-volume data loads into a single table:
- 500,000 rows per second
- 50 megabytes per second
Is this possible? Using the right tricks you can make Oracle load data at unbelievable speed. However, special knowledge and tricks are required.Oracle provides us with many choices for online data loading:
- SQL insert and merge statements
- PL/SQL bulk loads using the forall operator
If we can load in batch mode, we also have more options:
- Oracle Data Pump
- Oracle import utility
However there are vast differences in load speed (Figure 1).
Figure 1: Sample data load speeds
Batch Data Loading
If you are loading your data from flat files there are many products and Oracle tools to improve your load speed:
Oracle Data load tools:
- Oracle10g Data Pump (available January 2004) - With Data Pump Import, a single stream of data load is about 15-45 times faster than original Import. This is because original Import uses only conventional mode inserts, whereas Data Pump Import uses the direct path method of loading.
Third-party Data load tools:
- BMC Fast Import for Oracle - Claims to be 2 to 5x faster than Oracle import utility.
CoSORT FAst extraCT (FACT) for Oracle - Claims to get Bulk loads up to 90% faster when CoSORT pre-sorts the load file on the table's index key. This also improves clustering_factor and improves run-time SQL access speeds by reducing logical I/O.
Online Data Loading
Don't use standard SQL insets as they are far slower than other approaches. If you must use SQL inserts, make sure to use the APPEND hint to bypass the freelists and raise the high-water mark for the table. You are way better off using PL/SQL with the bulk insert features (up to 100x faster).
Other things to ensure:
- Use parallel DML - Parallelize the data loads according to the number of processors and disk layout. Try to saturate your processors with parallel processes.
Disable constraints and indexes ? Disable RI during load and re-enable (in parallel) following the load.
Tune object parms - Use multiple freelists or freelist groups for target tables. Avoid using bitmap freelists ASS management (automatic segment space management) for super high-volume loads.
Pre-sort the data in index key order - This will make subsequent SQL run far faster for index range scans.
Use RAM Disk - Place undo tablespace and online redo logs on Solid-state disk (RAM SAN)
Use SAME RAID - Avoid RAID5 and use Oracle Stripe and Mirror Everywhere approach (RAID 0+1, RAID10). However, this doesn't mean one large array smeared with everything, you will see performance gains from separating temp, data and index, redo and undo segments onto separate RAID areas.
Use a small db_cache_size - This will minimize DBWR work. In Oracle9i you can use the alter system set db_cache_size command to temporarily reduce the data buffer cache size.
Watch your commit frequency - Too frequent checkpoints can be a performance issue. To few commits can cause undo segment problems.
Use a large blocksize - Data loads onto 32k blocksizes will run far faster because Oracle will be able to insert more rows into an empty block before a write.
Here is a small benchmark showing the performance of loads into a larger blocksize:
alter system set db_2k_cache_size=64m scope=spfile; alter system set db_16k_cache_size=64m scope=spfile; startup force create tablespace twok blocksize 2k; <-- using ASM defaults to 100m create tablespace sixteenk blocksize 16k; create table load2k tablespace twok as select * from dba_objects; < creates 8k rows drop table load2k; <- first create was to preload buffers
set timing on;
create table load2k tablespace twok as select * from dba_objects;
create table load16k tablespace sixteenk as select * from dba_objects;
For a larger sample, I re-issued the create processes with:
select * from dba_source; -- (80k rows)
Even with this super-tiny sample on Linux using Oracle10g with ASM the results where impressive:
2k 16k blksze blksze 8k table size 4.33 secs 4.16 secs 80k table size 8.74 secs 8.31 secs
Driving your server
It is critical to take STATSPACK reports during data loading, paying special attention to the top-5 wait events.
- I/O-bound - Move to faster media (SSD), especially for undo and redo files.
CPU-bound - Add more processors and increase degree of parallel DML
Network-bound - Move data loading onto the save server as the Oracle instance
Using these techniques you can achieve blistering data load speeds for multi-terabyte Oracle databases.