Jonathan Lewis
Lock Bug
Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
-- created a table t1 to work with, no data needed, column definitions don't matter. Session 1: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 2: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 3: lock table t1 in exclusive mode; -- mode 6, goes into wait for session 1 Session 1 (again): lock table t1 in share row exclusive mode; -- mode 5, should move to converter queue, wait for session 2
The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).
Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 11.2.0.3; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !
My problem is this – the instance that gives the deadlock error is 64-bit 11.2.0.2 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:
- Windows vs. Linux
- 32 bit vs. 64 bit
- RAC vs. non-RAC
- Specific version of Oracle
I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 11.2.0.3 RAC, or 11.2.0.2 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 11.2.0.2 on RAC on Linux might like to confirm – or contradict – my result.)
Update 8:30 p.m. GMTThe results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 11.2.0.2 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.
Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.
Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 11.2.0.3 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.
Virtual DB
I’m heading off to Heathrow airport later on today to fly out to San Francisco for my week of experimenting with Delphix. I’ve done a little preparation work, of course, including browsing around the Internet to read about related technologies. Some of the material I found was very interesting, so I thought I’d go publish a few of the links that might be useful to other people.
It’s quite surprising to see how long the necessary core technology has been around; and yet there seems to have been minimal follow-up on the possibilities the technology makes available – perhaps because of the specific hardware, or special skills needed to put put together a working solution.
Here’s a short series from “Oracle Storage Guy” comparing Netapp and EMC (with a bit of a bias towards EMC) - it’s a few years old, and the companies mentioned have probably moved on, reviewed strategies, and refined what they do, but I suspect it’s still good background information for the non-specialist.
- Oracle backup – which storage is best – pt.1
- Oracle backup – which storage is best – pt.2
- Oracle backup – which storage is best – pt.3
- Oracle backup – which storage is best – pt.4
And an introduction to some OpenWorld presentations on Oracle’s dbClone from the same person.
A couple of posts by Kevin Closson on NFS and CloneDB
And a posting about Oracle’s clonedb from Tim Hall
A couple of (fairly long) Oracle white papers on cloning
- Cloning on ZFS with Rman – by Oracle
- Cloning on ZFS with Data Guard – by Oracle
- See also My Oracle Support article 1210656.1
And, of course, a couple of items from Kyle Hailey (Delphix) that talk about why Delphix is the right solution at the right time. Both items come with a useful list of links to articles on other technologoes
- Database Virtualization
- Database Virtualization and Instant provisioning (including video presentation)
Finally, a couple of posts from end-users:
Bottom line – it looks as if Delphix has created a product that is more flexible, more powerful and easier to use than the competition that has been around for several years, and they’ve done it on generic hardware. I’ll be trying to find out how reality lines up with expectations over the next week.
Duplicate indexes ?
I don’t think this is likely to happen on a production system (until 12c) – but look what you can do if you try hard enough:
1 select
2 index_name, column_name from user_ind_columns
3 where
4 table_name = 'T1'
5 order by
6* index_name , column_position
SQL> /
INDEX_NAME COLUMN_NAME
-------------------- --------------------
T1_I1 N1
V1
T1_I2 N1
V1
4 rows selected.
That’s a straight cut-n-paste from an Oracle 11.1.0.7 SQL*Plus session. (You can tell I typed it in real time because I missed the return before the FROM, and couldn’t be bothered to go back and do it again ;) )
Source Control
You may recall that I spent some time with the developers at the redgate offices in Cambridge (UK) a little while ago, looking at their Source Control for Oracle package. The product is about to go live, with a launch date of 12th March.
Because of the help I’ve given them they’ve offered my readers the chance of winning one of two 5-user licences for the product – provided I devise a strategy for picking the recipients.
So, to make it easy, all you have to do is persuade me that you really need or really deserve a copy. Write up a short description in the comments of the worst problem you’ve had to face because you didn’t have a decent source control system, or the best use you think you could make of a source control system for Oracle.
I’ll get the redgate developers to read what you wrote, and they will be the final judges of the two most interesting, or possibly most horrifying, or maybe even the most entertaining, cases.
Entries to be in by 23:59 GMT on 11th March.
UpdateThe product has launched early !
You can find out more and see screenshots on this web page.
And there’s an online demo of the tool on March 14 at 16:00 GMT (17:00 CET / 11:00 EST / 10:00 CST / 08:00 PST), where you will be able to ask any questions. If you’re interesting in seeing it you’ll have to register at this “go to meeting” URL.
The closing date for entries is still 11th March, 23:59 GMT.
Free Space
Question – How can you have a single file in a single tablespace showing multiple free extents when there are no objects using any space in that file ? For example, from an 11.1.0.7 database:
SQL> select 2 * 3 from user_free_space 4 where 5 tablespace_name = 'TEST_8K' 6 order by 7 file_id, block_id 8 ; TABLESPACE_N FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO ------------ ---------- ---------- ---------- ---------- ------------ TEST_8K 3 128 1048576 128 3 TEST_8K 3 256 1048576 128 3 TEST_8K 3 384 1048576 128 3 TEST_8K 3 512 130023424 15872 3 4 rows selected.
The answer in this case is simple – here’s what I did just before running my query:
SQL> create table t1(n1 number); Table created. SQL> create table t2(n1 number); Table created. SQL> create table t3(n1 number); Table created. SQL> drop table t1; Table dropped. SQL> drop table t2; Table dropped. SQL> drop table t3; Table dropped. SQL>
This test is on 11.1.0.7; test_8k is my default tablespace, uses an 8KB block size (did you spot the clue) with 1MB uniform extents. The segments created immediately and I haven’t purged my recyclebin. Because I’ve dropped the tables Oracle includes their space in the “free space” views, but because I need to be able to flash them back into existence the segments can’t be coalesced into the adjacent free space, and they will also be reported in dba_segments.
Here’s a harder one – there are NO objects in this tablespace, and nothing hiding in the recyclebin:
SQL> select 2 * 3 from user_free_space 4 where 5 tablespace_name = 'TEST_2K' 6 order by 7 file_id, block_id 8 ; TABLESPACE_N FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO ------------ ---------- ---------- ---------- ---------- ------------ TEST_2K 7 512 58720256 28672 7 TEST_2K 7 29184 58720256 28672 7 TEST_2K 7 57856 58720256 28672 7 TEST_2K 7 86528 32505856 15872 7
And here’s the statement I executed (as SYS) just before I ran the query – so no chance that there’s anything hidden in the file:
create
tablespace test_2k
datafile 'C:\ORACLE\ORADATA\d11g\d11g\test_2k.dbf'
SIZE 200M reuse
blocksize 2k
extent management local
uniform size 4k
segment space management manual
;
Update: 26th Feb
I’m sitting in Munich airport and boarding starts in 10 minutes, so just enough time to give an answer. I’ll start with a block dump of block 3 of the data file.
Block dump from disk:
buffer tsn: 22 rdba: 0x01c00003 (7/3)
scn: 0x0b86.06d63ae4 seq: 0x01 flg: 0x04 tail: 0x3ae41e01
frmt: 0x02 chkval: 0x4e8a type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0C112200 to 0x0C112A00
C112200 0000621E 01C00003 06D63AE4 04010B86 [.b.......:......]
C112210 00004E8A 00000007 00000200 00000000 [.N..............]
C112220 00000000 00003800 00000000 00000000 [.....8..........]
C112230 00000000 00000000 00000000 00000000 [................]
Repeat 123 times
C1129F0 00000000 00000000 00000000 3AE41E01 [...............:]
File Space Bitmap Block:
BitMap Control:
RelFno: 7, BeginBlock: 512, Flag: 0, First: 0, Free: 14336
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
...
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
This is a locally managed tablespace, so the third block of the file is the first bitmap space management block for the tablespace. The tablespace uses a 2KB block size (so the amount of space in the block for the bit map is slight under 2KB) and a 4KB uniform extent size (so each bit in the map represents 2 blocks/4KB in the tablespace).
As you can see from line 15, the first bit represents the extent starting at block 512, and there are 14,336 bits available (1,792 bytes), so the bitmap in the next block would start at block 512 + 2 * 14,336 = 512 * 28,672 = 29,184. This lines up exactly with the first chunk of free space reported in dba_free_space above.
As one of the comments indicated – the code that populates x$ktfbfe is probably called once for each bitmap space management block, and it doesn’t seem to bother trying to “coalesce” two free space fragments identified by adjacent bitmap blocks.
Deadlock Detection
By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:
“Oracle detects and resolves deadlocks automatically.”
Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.
Consider the following example (which, I have to admit, I wrote without access to a live instance):
Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted
Session 2
update jobs set status = 'STARTING' where id = 99;
1 row updated
Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2
Session 2
delete from t1 where id between 1 and 1000000
-- session 2 is now waiting on session 1
Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected
At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update (worst case I’ve seen: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would log “Unexpected Oracle error” to the front-end and retry the most recent action (at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and fore every 3 seconds until someone noticed what was going on).
In what way has the deadlock been “resolved” ?
In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.
Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.
Travelogue 3
I’m sure I wrote something a little while ago about cutting down on long-haul flights – but I’ve just been subjected to a positive barrage of invitations that I’ve found hard to resist. So my international travel itinerary for the future is starting to look like it might be a little busy. Probable highlights:
I’m aiming to take a short city-break in Moscow in May, and I’ll be stopping off with the people at Innova to spend a day talking about Oracle.
I’ve volunteered to take part in the southern leg of the “LA OTN” tour in August – aiming to visit Chile 01/08, Peru 03/08, Uruguay 06/08, Argentina 08/08, and Brazil 10/08 – flying home just in time for my son’s 21st birthday.
Then, in late October, I have an invitation to spend a couple of days in Beijing presenting at the All China Oracle User Group conference.
And if I manage to complete all that lot I think that will take my list of countries visited up to 60. Who knows, I may even find a couple more photos to publish in my “Travelogue” series.
Transactions 2
Here’s a little follow-on from Friday’s posting. I’ll start it off as a quiz, and follow up tomorrow with an explanation of the results (though someone will probably have given the correct solution by then anyway).
I have a simple heap table t1(id number(6,0), n1 number, v1 varchar2(10), padding varchar2(100)). The primary key is the id column, and the table holds 3,000 rows where id takes the values from 1 to 3,000. There are no other indexes. (I’d show you the code, but I don’t want to make it too easy to run the code, I want you to try to work it out in your heads).
I run the following pl/sql block.
begin -- -- Going to fail on primary key constraint -- for i in 1..200 loop begin insert into t1 values(50,i,'x','x'); commit; exception when others then null; end; end loop; -- -- Explicit rollback -- for i in 1..40 loop begin insert into t1 values(4000 + i,i,'x','x'); rollback; end; end loop; -- -- Explicit commit -- for i in 1..50 loop begin insert into t1 values(4500 + i,i,'x','x'); commit; end; end loop; -- -- Rolling back to savepoint -- for i in 1..70 loop begin savepoint A; insert into t1 values(4600 + i,i,'x','x'); rollback to savepoint A; end; end loop; end; /
If (from another session) you take an AWR (or Statspack) snapshot before and after running the block, what will you see for the statistics
user commits
user rollbacks
transaction rollbacks
rollback changed – undo records applied
and, for a bonus point, what will the value of “Rollback per transaction %” be ?
Note – since the snapshots are global, your results will be affected by other work on the system; in particular I noticed that simply taking two AWR snapshots in a row on the instance I used resulted in 8 user commits.
Update
And the answers – cut from an AWR report – are:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- rollback changes - undo records 420 25.2 4.3 transaction rollbacks 240 4.7 2.5 user commits 58 1.1 0.6 user rollbacks 40 0.8 0.4 Rollback per transaction %: 40.82 Rows per Sort: 122.70
The 40 “user rollbacks” come from the loop with the explicit rollback. At the same time the 40 rollbacks introduced 80 “rollback changes – undo records applied” – one for the table index and one for the table for each call to rollback. Our “user rollbacks” have been “real” rollbacks, so they’ve introduced 40 transaction rollbacks at the same time.
The 50 “user commits” come from the loop with the explicit commit – except that the picture is slightly blurred by the fact that simply running the AWR snapshot introduced a few extra commits.
The 40.82% is 40 / (40 + 58) expressed as a percentage.
The “rollback to savepoint” calls haven’t contributed to the user rollbacks – even though, in this case, they have rolled back 40 transaction starts, and a check of various other statistics would show a TX enqueue being requested and released, and transactions rotating through the undo segments. Despite this clearing of the TX locks, these rollbacks to savepoint don’t count towards transaction rollbacks; however, the 70 “rollback to savepoint” calls have introduced a further 140 “rollback changes – undo records applied” (again, one for the table and one for the index for each call).
Finally the failed attempts to insert duplicate keys have, in these circumstances, introduced 200 transaction rollbacks (though not user rollbacks – we didn’t ask for them explicitly). They are also responsible for the final 200 “rollback changes – undo records applied” that we need to account for. The changes, very specifically, are the changes to the table – Oracle does actually have to insert the row into the table before trying to insert the index entry (after all, the index entry needs to know the rowid), and so it has to reverse that insert when it discovers that the relevant entry already exists in the primary key index. The requirement to insert and then rollback is one of the reasons why it is often better for “batch merge by pl/sql loop” to try an update then insert is sql%rowcount is zero rather than trying to insert, capture the exception, and then update; it’s also a good reason for creating the unique indexes on a table before creating the non-unique indexes – Oracle maintains indexes in the order they were created so you want it to find the duplicate key error as soon as possible to minimise the amount of change and rollback.
If you want to experiment further, there are a couple of slightly different variations of the code that exhibit interesting variations in results. There are also other events that can cause transaction rollbacks to appear without user rollbacks being invoked.
Transactions
It’s very easy to get a lot of information from an AWR (or Statspack) report – provided you remember what all the numbers represent. From time to time I find that someone asks me a question about some statistic and my mind goes completely blank about the exact interpretation; but fortunately it’s always possible to cross check because so many of the statistics are cross-linked. Here’s an example of a brief mental block I ran into a few days ago – I thought I knew the answer, but realised that I wasn’t 100% sure that my memory was correct:
In this Load Profile (for an AWR report of 60.25 minutes), what does that Transactions figure actually represent ?
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 57,567.09 12,028.39
Logical reads: 48,043.83 10,038.54
Block changes: 314.07 65.62
Physical reads: 667.70 139.51
Physical writes: 46.25 9.66
User calls: 619.33 129.41
Parses: 505.67 105.66
Hard parses: 36.94 7.72
Sorts: 313.05 65.41
Logons: 0.56 0.12
Executes: 1,165.42 243.51
Transactions: 4.79
% Blocks changed per Read: 0.65 Recursive Call %: 95.18
Rollback per transaction %: 24.95 Rows per Sort: 25.09
And, while we’re at it, what does the “Rollback per transaction %” actually mean – and what, if anything, can we infer from the value ?
Since we’re looking at transactions, it’s a fairly safe bet that the figure is something to do with commits – but is it counting only committed transactions, or does it included rolled back transactions [philosophical question - is a change that's never committed really a transaction, since it "never happened" as far as the rest of the world is concerned?]. Fortunately we can look at the Intance Activity Statistics to check.
First, though, lets convert the “Per Second” figures into an absolute value by multiplying by the duration in seconds (3,615) of the report. Allowing for rounding errors we’re looking for a value between ceiling(4.785 *3615) and floor(4.795 * 3615), i.e. between 17,298 and 17,338. Let’s also keep in mind that the “Rollback per transaction %” is very close to 25%.
Here, then, are a couple of useful figures from the Instance Activity:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- user commits 12,986 3.6 0.8 user rollbacks 4,316 1.2 0.3
Almost immediately you can see that user rollbacks are roughly one third of user commits (I promise I didn’t massage these number), which means the rollback percentage is just (user rollbacks / (user commits + user rollbacks)). As a further sanity check, 4,316 + 12,986 = 17,302 which falls nicely into our required range. Ta-da, job done: “Transactions” is the sum of user commits and user rollbacks (as a first approximation – but is it the whole answer).
Of course, I picked a fairly extreme example from my AWR library to make a point – which means you might now be asking whether all those rollbacks pose some sort of threat. How serious are they ? Luckily there are a couple more statistics that tell us:
Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- rollback changes - undo records 222 0.1 0.0 transaction rollbacks 59 0.0 0.0
The name of the first statistic has been trimmed a little in this text report, the full name is: “rolllback changes – undo records applied”. As you can see, we’ve only applied 222 undo records in our 4,316 rollback calls, so we’re not really rolling anything back (most of the time). Moreoever, the “transaction rollbacks” corroborates this observation – we have only attempted to rollback 59 “real” (data-changing) transactions. Most of those rollbacks are probably the default “rollback after every call” that some web application servers make.
As a closing thought – if you’re responsible for several different systems, it’s convenient to keep a couple of ”reference” AWR or Statspack reports from busy, normal, and quiet periods for each system; that way, if you use the figures from one system to work out the meaning of some derived value you can use the figures from another system to check if your hypothesis is correct.
Footnote: there are two other statistics that include the text “undo records applied” (not that you’ll see those words in the textual AWR report), but they both relate to creating read-consistent copies of blocks.
Footnote 2: Is anyone getting worried by my “first approximation” comment – and has anyone started wondering if there may be more to transaction rollbacks that user rollbacks, and if so where they fit into the arithmetic ? To be continued.
STS, OFE and SPM
That’s SQL Tuning Sets, optimizer_features_enable, and SQL Plan Management.
There’s a recent post on OTN describing an issue when using SQL Tuning Sets to enforce plan stability when upgrading from 10.2.0.3 to 11.2.0.3 – it doesn’t always work. Here’s a very simple model to demonstrate the type of thing that can happen (the tables are cloned from a completely different demo, so don’t ask why I picked the data they hold):
create table t1 as select trunc((rownum-1)/15) n1, trunc((rownum-1)/15) n2, rpad(rownum,180) v1 from all_objects where rownum <= 3000 ; create table t2 as select mod(rownum,200) n1, mod(rownum,200) n2, rpad(rownum,180) v1 from all_objects where rownum <= 3000 ; create index t1_i1 on t1(n1); create index t2_i1 on t2(n1); execute dbms_stats.gather_table_stats(user,'t1'); execute dbms_stats.gather_table_stats(user,'t2'); alter session set optimizer_features_enable = '10.2.0.3'; explain plan for select /*+ ordered use_nl(t1) index(t1) */ t2.n1, t1.n2 from t2,t1 where t2.n2 = 45 and t2.n1 = t1.n1 ; set pagesize 60 set linesize 132 set trimspool on select * from table(dbms_xplan.display(null,null,'outline'));
You’ll notice I’ve included a directive to set the optimizer_features_enable back to 10.2.0.3. If I run this test on both 10.2.0.3 (real) and 11.2.0.3 (with ofe set) I get the same plans but slightly different outline data.
From 10.2.0.3
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 225 | 3600 | 35 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 15 | 120 | 2 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 225 | 3600 | 35 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | T2 | 15 | 120 | 5 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_NL(@"SEL$1" "T1"@"SEL$1")
LEADING(@"SEL$1" "T2"@"SEL$1" "T1"@"SEL$1")
INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
FULL(@"SEL$1" "T2"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
From 11.2.0.3
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 225 | 3600 | 54 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 15 | 120 | 2 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 225 | 3600 | 54 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | T2 | 15 | 120 | 24 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
NLJ_PREFETCH(@"SEL$1" "T1"@"SEL$1")
USE_NL(@"SEL$1" "T1"@"SEL$1")
LEADING(@"SEL$1" "T2"@"SEL$1" "T1"@"SEL$1")
INDEX(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
FULL(@"SEL$1" "T2"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
DB_VERSION('11.2.0.3')
OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Apart from the appearance of the db_version() hint in 11.2.0.3 the most important difference in the outline data is the hint nlj_prefetch() hint. 11g introduced a new mechanism for nested loop joins called NLJ Batching, at the same time introducing two new hints to allow the optimizer to specify which mechanism a plan should use, prefetch (nlj_prefetch()) or batching (nlj_batching()). Since 10g only does prefetching it doesn’t have (or need) a hint to specify the mechanism.
The outline section data from a plan is basically what Oracle stores as an SQL Plan Baseline – so if I use the approved method to turn the 10g outline data above into the 11g SQL Plan Baseline what’s going to happen to the execution plan when I run the query in the default 11g environment ? It’s easy to demonstrate (at least in this case) by simply cutting and pasting the entire 10g outline into the original SQL statement and generating its plan under 11g; here’s the result:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 225 | 3600 | 54 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 225 | 3600 | 54 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | T2 | 15 | 120 | 24 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 15 | 120 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_NL(@"SEL$1" "T1"@"SEL$1")
LEADING(@"SEL$1" "T2"@"SEL$1" "T1"@"SEL$1")
INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."N1"))
FULL(@"SEL$1" "T2"@"SEL$1")
OUTLINE_LEAF(@"SEL$1")
ALL_ROWS
DB_VERSION('11.2.0.3')
OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
The plan has changed. I was slightly surprised to see in this case that the optimizer used neithor NLJ prefetching, nor NLJ batching, it went all the way back to the traditional nested loop mechanism.
If I had actually captured the original outline hints as an SQL Baseline the optimizer would have found the baseline in the data dictionary, generated this new plan from it, discovered that the plan hash value for this plan didn’t match the plan hash value for the stored plan, and re-optimized the query from scratch - potentially producing a totally different execution plan.
In my example 10g and 11g both wanted to use the nlj_prefetch mechanism when pushed into the nested loop join. 10g, of course, doesn’t have a relevant hint, so I got lucky that 11g wanted to do what 10g had done. In the case of the OP on the OTN forum 11g had decided that it preferred nlj_batching over nlj_prefetch when attempting to apply the baseline, so the plan hash values didn’t match and the optimizer became free to choose a completely different plan.
Without looking very carefully through all the hints available to 11g I can’t decide whether there are other similar cases to worry about – but if you see 11g ignoring SQL Plan Baselines that have been generated from 10g, then look for operations where 11g has two options (and perhaps two hints) for performing a given operation where 10g has only one option, and perhaps that’s where the problem lies.
Footnotes:In the case of the OP’s example, a possible workaround could start by setting the hidden parameter “_nlj_batching_enabled”=0; obviously this shouldn’t be done on a production system without approval of Oracle Support, and it’s never a desirable strategy to change a global parameter to fix a local problem – so I’d prefer to set the parameter in a session and generate a new SQL Plan Baseline that would then (probably) include either the nlj_prefetch() hint, or maybe it would turn out to be the no_nlj_batching() hint.
To see what plan (and hints) the SQL Plan Baseline would have generated, the OP enabled SPM tracing using the new event mechanism, in this case:
alter session set events 'trace[RDBMS.SQL_Plan_Management.*]'; -- run, or explain the query here alter session set events 'trace[RDBMS.SQL_Plan_Management.*] off';
Snapper
If you’ve used Tanel Poder’s snapper script then you probably want to know about the latest release, and webinar coming up this Wednesday.
Optimisation ?
I was at a client site recently where one of the end-users seemed to have discovered a cunning strategy for optimising a critical SQL statement. His problem was that his query screen times out after 2 minutes, so any query he runs has to complete in less than two minutes or he doesn’t see the results. Unfortunately he had a particular query which took nearly 32 minutes from cold to complete – partly because it’s a seven-table join using ANSI OUTER joins, against tables ranging through the 10s of millions of rows and gigabytes of data – the (necessary) tablescan of the table that had to be first in the join order took 70 seconds alone.
But our intrepid user seems to have made an important discovery and engineered a solution to his performance problem. I think he’s noticed that when you run a query twice in a row the second execution is often faster than the first. I can’t think of any other reason why the same person would run the same query roughly every four minutes between 8:00 and 9:00 am every morning (and then do the same again around 5:00 in the afternoon).
Looking at the SQL Monitoring screen around 10:00 the first day I was on-site I noticed this query with a very pretty graphic effect of gradually shrinking blue bars as 32 minutes of I/O turned into 2 minutes of CPU over the course of 8 consecutive executions which reported run times something like: 32 minutes, 25 minutes, 18 minutes, 12 minutes, 6 minutes, 4 minutes, 2.1 minutes, 2 minutes.
It’s lucky (for that user) that the db_cache_size is 60GB. On the other hand this machine is one of those Solaris boxes that likes to pretend that it’s got 128 CPUs when really it’s only 16 cores with 8 lightweight threads per core – you don’t want anyone running a query that uses 2 solid CPU minute on one of those boxes because it’s taking out 1/16th of your CPU availability, while reporting a load of 1/128 of your CPUs.
Footnote: the query can be optimised (properly) – it accessed roughly 100M rows of data to return roughly 300 rows (with no aggregation), so we just need to do a little bit of work on precise access paths.
Delphix
If you’re a regular follower or my blog you may recall Kyle Hailey and the joint webinar we did nearly two years ago on “Visual SQL Tuning” covering an approach I’ve written about in the past and a product that he developed at Embarcadero to automate the work that I’d been doing by hand and eye.
Kyle has now moved on to Delphix, and has become involved with another really interesting piece of technology – database virtualization. How do you supply a terabyte sized database to five different development teams without using up 5TB of disc space ? Create an operating environment that keeps one master copy of the database while maintaining a set of (small) private files for each team that hold private copies of the blocks that have been changed by that team – and that’s just one feature of the product.
The product is sufficiently interesting (plus I have a healthy regard for Kyle’s opinions) that I’ve accepted an invitation to go over to California for a few days next month to experiment with it, see what it can do, try to stress it a bit and so on. The people at Delphix are so confident that I’ll be impressed that they’re going to let me do this and then write up a blog telling you how things went.
Have a browse around their documentation and if you’re interested add a suggestion to the comment telling me what you’d like me to test, and how, and I’ll see if I can fit it into my timetable (no promises – but if you come up with interesting ideas I’ll see what I can do).



