PL/SQL Tuning for Batch Systems
Where were you in 1990? Nelson Mandela was being freed from Victor Verster Prison after 26 years behind bars, Saddam Hussein was starting the Gulf War by invading Kuwait, and Tim Berners-Lee was inventing the World-Wide-Web at CERN in Geneva. Me? In 1990, I was writing an insurance system in Oracle SQL*Forms v2.3.
Forms 2.3 - the last version before the introduction of PL/SQL - was frustrating and irritating in equal measures. It had only one control structure: GOTO. Clearly something had to give. Oracle provided the ability to link C modules (Pro*C actually) into the runform.exe executable; they were called User Exits and could be called directly from that abominable Forms trigger language. Although their intended use was just to "go do something outside Oracle", programmers were writing entire languages - procedural languages - in Pro*C and linking them into Forms. Urban legend has it that PL/SQL was one of these languages written by a frustrated techo in the 1980's and that it was so successful that Oracle bought it.
Forms v3.0 did not improve the situation - it just legitimised it by making PL/SQL the native trigger language of Forms. But something far more important was happening at about the same time: Oracle v6.0 was being shipped with PL/SQL on the server. For the first time, it was possible to integrate SQL and procedural logic without the overheads of a client-server architecture. This was seen as a great leap forward; the big problem with client-server was all that nasty context-switching. PL/SQL - ProceduraL SQL - would combine procedural control structures with the SQL language and eliminate context-switching from batch jobs. Right? Well..... umm.... not exactly. PL/SQL inherited more from its Forms roots than just syntax - it used the same two-engine architecture. Seventeen years later, looking at the diagram in Chapter 1 of the Oracle 10g PL/SQL Users Guide and Reference: it still does!
So what is context switching anyway? I Googled it and found the following concise definition:
The process of saving an executing thread or process and transferring control to another thread or process.
But how much does it cost performance? Run the following test script in SQL*Plus to see for yourself:
CREATE TABLE context_switch_test AS SELECT * FROM dual WHERE 1=0; SET TIMING ON /* Process 100,000 rows, switching context every 1000 rows */ DECLARE CURSOR rec_cur IS SELECT level FROM dual CONNECT BY level <= 100000; TYPE rec_tab_type IS TABLE OF NUMBER(10); rec_tab REC_TAB_TYPE; BEGIN OPEN rec_cur; LOOP FETCH rec_cur BULK COLLECT INTO rec_tab LIMIT 1000; EXIT WHEN rec_tab.COUNT() = 0; FORALL i IN rec_tab.FIRST .. rec_tab.LAST DELETE FROM context_switch_test WHERE rec_tab(i) IS NOT NULL; END LOOP; CLOSE rec_cur; END; / /* Process 100,000 rows, switching context every row */ BEGIN FOR rec IN ( SELECT level FROM dual CONNECT BY level <= 100000 ) LOOP DELETE FROM context_switch_test WHERE rec.level IS NOT NULL; END LOOP; END; /
On my database, the first example took 1.43 seconds, and the second one - which performed 1000 times more context switching - took 7.62 seconds.
"So," I hear you say, "6 more seconds to perform 99,000 context switches: I can live with that". But consider the ETL of a large modern data warehouse: it may process 1 million or more rows (1 million context switches), open half-a-dozen more cursors nested inside the main loop, each of which fetch 5 or more rows (30 million), and perform a million inserts (1 million more). Then consider there might be 50 such jobs running every night; at 6 seconds per 100,000 switches, that's 26 hours of unnecessary processing time. And the most common solution? Buy bigger boxes! Yeah!
The absolute best way to tune PL/SQL batch jobs is to stop using PL/SQL (well, almost). When we're applying a data-patch, or just cutting code free-hand in SQL*Plus (or TOAD, or Navigator, or whatever's your pleasure), we're only too happy to perform multi-row UPDATE, DELETE, and INSERT statements that push all of the iteration logic (and - with WHERE clauses - the conditional logic as well) onto the SQL engine. So what is with the pathalogical, lemming-like urge to write cursor-loops in Production application code? There are a number of reasons, but most of them crumble under close scrutiny:
- "But for each row returned by the cursor, we need to lookup values from several other tables."
Join those lookup tables into the main cursor.
- "But we perform Upserts. Try to INSERT, and then UPDATE the existing row if the INSERT fails with key-violation."
Right on! Double your context switches! Use the MERGE statement instead to process the entire data set in a single statement. Still using v8i? Load all of the rows you need into a Global Temporary Table, and then perform separate bulk UPDATE and INSERT statements from the GTT.
Intuition tells us this should be slower because the data is being written twice; it's not. Inserts into GTTs generate less IO than regular tables because they don’t need to be recoverable in the event of a system crash. The additional time it takes to write the GTT is still much faster than the context switches and the single-row lookups performed by the cursor-loop.
- "But we have to write to a log file every N rows. Prod Support can't tell where it’s up to otherwise."
Rubbish! You wouldn't need to check where it was up to if it was already finished. Anyway, long-running SQLs with Full Table Scans, sorts, and hash joins can be monitored with the V$SESSION_LONGOPS view.
- "But we do business rule and integrity checking. Invalid rows need to go to another table."
Use the multi-table INSERT FIRST statement (errors to one table, the rest to the main table). Or in 8i, insert everything into a Global Temporary Table, then INSERT valid rows into the main table and errors into the error table with separate queries on the GTT.
- "But we have to commit every N rows, otherwise we'll blow the rollback segments."
In an OLTP system, perhaps; they are configured for many concurrent small transactions. The needs of the overnight batch are often overlooked. One work-around is to process the data in manageable chunks. One clever technique is Rowid Range; search AskTom.oracle.com for more details.
A Data Warehouse is a different story; large transactions are its bread-and-butter and it needs to be able to handle them. The DBA should set up rollback segments (UNDO) to handle transactions of several Gigabytes - more if necessary.
The power of in-line views, user-defined functions and analytic functions allow us to resolve queries in SQL that were previously impossible; however this should not be seen as a challenge to code every query in SQL at the expense of concise, intuitive and maintainable code. There are situations where Set Processing is inappropriate, including complex iterative logic, complex conditional logic and branching, N-M cardinality joins, and mixed-cardinality multi-table INSERTs.
When common sense dictates that procedural logic is necessary, it can still be achieved with a minimal drop in performance by minimising context switching. Context switching occurs whenever a SQL operation is performed. This includes:
- SELECT .. INTO statements.
- INSERT, DELETE, and UPDATE statements.
- OPEN, FETCH, and CLOSE a cursor.
- FOR rec IN (cursor) loops.
- Any DDL statements executed via Dynamic SQL.
BULK COLLECT reduces FETCH operations by fetching many rows at a time into a collection. It should always be used in conjunction with the LIMIT clause where there is a risk of fetching too many rows, as the resultant collection would exceed memory limitations and start paging to disk. Anywhere from 100 to 1000 is a practical LIMIT; it is not important to find the "perfect" number as there is a law of diminishing returns.
BULK COLLECT with the LIMIT clause is only available with explicit FETCH statements, which is annoying because it precludes the tidy and intuitive Implicit Cursor FOR loops.
Implicit Cursor FOR Loop: tidy, intuitive, and very, very, slow
FOR empl IN ( SELECT ename FROM emp ) LOOP <... some statements ...> END LOOP;
Implicit Cursor BULK COLLECT: efficient, but not scalable
SELECT ename BULK COLLECT INTO empl_tab FROM emp; FOR i IN empl_tab.FIRST .. empl_tab.LAST LOOP <... some statements ...> END LOOP;
Explicit Cursor BULK COLLECT with LIMIT: fast and scalable, but wordy and complex
DECLARE CURSOR empl_cur IS SELECT ename FROM emp; TYPE empl_tab_typ IS TABLE OF EMPL_CUR%ROWTYPE; empl_tab EMPL_TAB_TYPE; BEGIN OPEN empl_cur; LOOP FETCH empl_cur BULK COLLECT INTO empl_tab LIMIT 1000; EXIT WHEN empl_tab.COUNT = 0; FOR i IN empl_tab.FIRST .. empl_tab.LAST LOOP <... some statements ...> END LOOP; EXIT WHEN empl_cur%NOTFOUND; END LOOP; CLOSE empl_cur; END;
Unlike the FOR loop, FORALL is more truly a statement than a loop; it can only repeat a single DML statement. Every INSERT, UPDATE, and DELETE statement in a PL/SQL program performs a context switch from the PL/SQL engine to the SQL engine. But by loading all of the rows to be processed into a collection and performing the DML inside a FORALL statement, the DML and the collection are passed to the SQL engine and run with a single context switch. Exceptions can be captured using the SAVE EXCEPTIONS clause and are returned in the SQL%BULK_EXCEPTIONS collection.
DECLARE empl_tab_type IS TABLE OF emp_cur%ROWTYPE; -- collection of records empno_tab_type IS TABLE OF emp.empno%TYPE; -- collection of scalars empl_tab EMPL_TAB_TYPE; empno_tab EMPNO_TAB_TYPE; -- create an exception handler for ORA-24381 dml_errors EXCEPTION; PRAGMA EXCEPTION_INIT(dml_errors, -24381); BEGIN : : FORALL i IN empl_tab.FIRST .. empl_tab.LAST SAVE EXCEPTIONS UPDATE emp SET ROW = empl_tab(i) WHERE empno = empno_tab(i); : : EXCEPTION WHEN dml_errors THEN FOR i IN 1 .. SQL%BULK_EXCEPTIONS.COUNT LOOP <... handle the errors ...> END LOOP; END;
As demonstrated in this example, it is currently a limitation of PL/SQL that SQL statements can reference only entire rows of a collection, not individual data items within a record. Entire rows are useful when updating with SET ROW (as above) or record-wise inserting, but you must maintain separate scalar collections to reference individual values in the WHERE clause. Many find it easier to use only scalar collections and ignore the attractions of record-wise inserts and updates.
Single row lookups in PL/SQL should never be performed as SQL statements (or cursors) inside a cursor loop. Instead, they should be merged into the main cursor using a table join; use an outer-join if there is a chance the row may not exist in the lookup table. This is still an efficient method even if the conditional PL/SQL logic does not need the joined data for every row returned.
Some expert systems may have complex or iterative branching logic that makes the number of lookup permutations impractical. In these situations, the lookup tables can be cached in PL/SQL using an associative array (Index-By Table) so that lookups do not need a context switch.
DECLARE TYPE currency_lkp_type IS TABLE OF currency%ROWTYPE INDEX BY CURRENCY.CURRENCY_CODE%TYPE; currency_lkp CURRENCY_LKP_TYPE; BEGIN FOR curr_rec IN (SELECT * FROM currency) LOOP currency_lkp(curr.currency_code) := curr_rec; END LOOP; : : FOR i IN txn_tab.FIRST .. txn_rec.LAST LOOP : : -- Lookup the currency table without using SQL txn_tab.currency_desc := currency_lkp( txn_tab(i).currency_code ); : : END LOOP; END;
Note that this example breaks a cardinal rule by loading the associative array using an implicit-cursor FOR loop. It would be possible to BULK COLLECT the rows into a nested table, and then transfer the nested table row-by-row into the associative array, but it seems like overkill for downloading a small table just once for the entire job.
Almost every data retrieval requirement for a batch process can be satisfied either by merging a lookup into the main cursor, or by using associative arrays. In a very small number of cases, there is a need to process rows from two or more very large tables which have N:M join-cardinality.
For example: consider a billing run where an invoice must be generated for each Customer, but in order to generate that invoice the program needs to process all of the Cusomter's Services, plus all of the Customer's Billing Instructions. If there was only one Billing Instruction per customer (say, the latest one), then the main cursor could join the Customer table to Customer Service (return a row per Service per Customer) and join in the latest Billing Instruction to each row. However, when multiple Billing Instructions are required, such a join would destroy the cardinality of the cursor. One Customer joined to 5 Services and 4 Billing Instructions would explode to 5*4=20 rows.
The most efficient solution to such a problem is to use Concurrent Cursors. For this example, use 3 explicit cursors: one that retrieves all of the customers ordered by the customer number, a second that retrieves all of the Customer Services ordered by customer number, and a third that retrieves all of the Customer Billing Instructions ordered by customer number. Open all of the cursors once only at the beginning of the program, and then fetch the rows customer by customer.
To demonstrate the concept more clearly, the sample code below does not use BULK COLLECT, although it could be adapted to do so.
-- Declare 3 cursors CURSOR cust_cur IS SELECT * FROM customer ORDER BY cust_num; CURSOR custserv_cur IS SELECT * FROM customer_service ORDER BY cust_num; CURSOR custinst_cur IS SELECT * FROM customer_billing_instruction ORDER BY cust_num -- Declare a record type to FETCH each cursor, and collections to load -- the services and billing instructions for each customer as they are -- processed. curr_cust_rec CUST_CUR%ROWTYPE; curr_custserv_rec CUSTSERV_CUR%ROWTYPE; curr_custinst_rec CUSTINST_CUR%ROWTYPE; TYPE custserv_tab_type IS TABLE OF CUSTSERV_CUR%ROWTYPE; TYPE custinst_tab_type IS TABLE OF CUSTINST_CUR%ROWTYPE; custserv_tab CUSTSERV_TAB_TYPE; custinst_tab CUSTINST_TAB_TYPE; BEGIN -- Open all the cursors and pre-fetch the first row. OPEN cust_cur; OPEN custserv_cur; OPEN custinst_cur; FETCH cust_cur INTO curr_cust_rec; FETCH custserv_cur INTO curr_custserv_rec; FETCH custinst_cur INTO curr_custinst_rec; WHILE cust_cur%FOUND LOOP -- Load up the services for the customer custserv_tab.DELETE; WHILE custserv_cur%FOUND AND curr_custserv_rec.cust_num <= curr_cust_rec.cust_num LOOP IF curr_custserv_rec.cust_num = curr_cust_rec.cust_num THEN custserv_tab.EXTEND; custserv_rec.LAST := curr_custserv_rec; END IF; FETCH custserv_cur INTO curr_custserv_rec; END LOOP; -- Load up the billing instructions for the customer custinst_tab.DELETE; WHILE custinst_cur%FOUND AND curr_custinst_rec.cust_num <= curr_cust_rec.cust_num LOOP IF curr_custinst_rec.cust_num = curr_cust_rec.cust_num THEN custinst_tab.EXTEND; custinst_rec.LAST := curr_custinst_rec; END IF; FETCH custinst_cur INTO curr_custinst_rec; END LOOP; <... Process the invoice using services in the custserv_tab collection and billing instructions in the custinst_tab collection ...> FETCH cust_cur INTO curr_cust_rec; END LOOP; END;
Even though the above example – not using BULK COLLECT – does not significantly reduce context switching to any significant degree, it still has a profound improvement in performance: usually several orders of magnitude. The reason for this is that the Customer Service and Cusotmer Billing Instruction tables are processed in a single fast Full Table Scan, rather than the death-of-a-thousand-cuts method of reading via the index in a separate cursor per customer.
PL/SQL performance tuning is easy: you only need to remember only 2 things:
- Do it in pure SQL if you can, and
- If you can't, never code SQL inside a cursor loop.
In response to a query on the forum, I did some more investigating. In the demo script above, the bulk of the performance gain is from the FORALL, not from the BULK COLLECT. Tracing this script, I found that the DML within the FORALL statement both parses and executes once only, whereas the DML inside the cursor loop executes once for each iteration.
In fact, if you compare BULK COLLECT to a cursor loop without any SQL inside the loop, performance improvements are difficult to find.
Conclusion: whilst the above methods are still all effective tuning methods, it is primarily because they reduce cursor parse and execute activity, and secondarily because they reduce context switching.