Tanel Poder
Finding the reasons for excessive logical IOs
There’s another interesting thread going on in Oracle-L, about understanding logical IOs and drilling down into their reasons. Of course sometimes (or rather usually) the excessive logical IOs come from a bad execution plan (when a nested loop loops over lots of datablocks again and again or a wrong index is used for driving a query etc), but sometimes the excessive LIOs are caused by some internal issues, like space management etc.
A convenient tool I use for reporting logical IO reasons is (again) my Snapper! It has an option “b” for reporting Buffer get reasons or as I use below – option “a” shows All information Snapper can show.
There are couple of gotchas though which make this approach imperfect:
- The X$ tables Snapper uses for LIO reason reporting contain instance-wide counters, not specific to a single testing session. Thus you either need to be the single user in your database when experimenting and even then the background activity may increment some counters while you are testing too. I have sometimes suspended all other processes (kill -STOP and kill -CONT to resume) or used Flash Freeze (oradebug ffbegin and ffresumeinst) to hang the whole instance that there would be no other activity going on.
- These buffer get reason counters are not maintained properly in Oracle 11g, probably due an optimization effort and some changes for faster pinning of buffer cache buffers (there’s a parameter called _fastpin_enable which is set to 1 in 11g and it enables so called fastpath buffer gets. If you see v$sesstat statistics such “consistent gets from cache (fastpath) or “db block gets from cache (fastpath)” being inremented, then fastpath buffer gets/pins are used. Note that I do have a script which works also on 11g but I’ll write about that one some time in the future :)
Anyway, if you are testing in an environment exclusively used by you, on Oracle 10.2 or lower, then you can run snapper with the gather=a option to report a bunch instance-level statistics in addition to the standard session-level stats:
- BUFG – Buffer get reasons (both consistent and current mode gets)
- LATG – Latch gets (both willing to wait and immediate gets)
- ENQG – Enqueue gets
Here’s an example, prepare for long output:
SQL> @snapper gather=a 5 1 128 -- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com ) -- ---------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ---------------------------------------------------------------------------------------------------------------------- -1, , BUFG, kcbwh2: kcbchg1 , 1, .2, -1, , BUFG, kttewh00: ktte_get_file_info , 20, 4, -1, , BUFG, ktswh28: ktsgsp , 7, 1.4, -1, , BUFG, ktswh39: ktsrsp , 1, .2, -1, , BUFG, ktswh72: ktsbget , 7, 1.4, -1, , BUFG, ktswh76: ktsxadd , 2, .4, -1, , BUFG, ktswh77: ktsxadd , 1, .2, -1, , BUFG, ktswh201: ktsxbmdelext , 1, .2, -1, , BUFG, ktspfwh10: ktspscan_bmb , 9, 1.8, -1, , BUFG, ktspswh12: ktspffc , 9, 1.8, -1, , BUFG, ktsphwh39: ktspisc , 9, 1.8, -1, , BUFG, ktspbwh1: ktspfsrch , 9, 1.8, -1, , BUFG, ktuwh01: ktugus , 26, 5.2, -1, , BUFG, ktuwh02: ktugus , 20, 4, -1, , BUFG, ktuwh03: ktugnb , 7, 1.4, -1, , BUFG, ktuwh05: ktugct , 5, 1, -1, , BUFG, ktuwh09: ktugfb , 6, 1.2, -1, , BUFG, ktuwh27: kturbk , 7, 1.4, -1, , BUFG, ktewh25: kteinicnt , 30, 6, -1, , BUFG, ktewh26: kteinpscan , 23, 4.6, -1, , BUFG, ktewh27: kteinmap , 14, 2.8, -1, , BUFG, ktewh43: ktelbhw , 2, .4, -1, , BUFG, ktewh51: ktelwbl , 3, .6, -1, , BUFG, ktewh78: kteopdelete , 20, 4, -1, , BUFG, ktewh80: kteopdelete , 20, 4, -1, , BUFG, ktewh88: kteopshrink , 1, .2, -1, , BUFG, ktewh89: kteopshrink , 1, .2, -1, , BUFG, ktfbwh06: ktfbffpre , 21, 4.2, -1, , BUFG, ktfbwh0d: ktfbsearch , 1, .2, -1, , BUFG, ktfbwh36: ktfbbsearch , 1, .2, -1, , BUFG, ktfbwh37: ktfbbset , 22, 4.4, -1, , BUFG, kdswh01: kdstgr , 284, 56.8, -1, , BUFG, kdswh02: kdsgrp , 3237, 647.4, -1, , BUFG, kdswh05: kdsgrp , 1201, 240.2, -1, , BUFG, kdswh06: kdscgr , 42, 8.4, -1, , BUFG, kdtwh01: kdtgrs , 4, .8, -1, , BUFG, kduwh01: kdusru , 13, 2.6, -1, , BUFG, kduwh02: kdusru , 1, .2, -1, , BUFG, kdiwh06: kdifbk , 415, 83, -1, , BUFG, kdiwh07: kdifbk , 924, 184.8, -1, , BUFG, kdiwh08: kdiixs , 82, 16.4, -1, , BUFG, kdiwh09: kdiixs , 3438, 687.6, -1, , BUFG, kdiwh15: kdifxs , 48, 9.6, -1, , BUFG, kdiwh17: kdifind , 187, 37.4, -1, , BUFG, kdiwh22: kdifind , 208, 41.6, -1, , BUFG, kdiwh23: kdiins , 2, .4, -1, , BUFG, kdiwh41: kdifbk , 5, 1, -1, , BUFG, kdiwh42: kdiixs , 3397, 679.4, -1, , BUFG, kdiwh126: kdisparent , 2, .4, -1, , BUFG, kdiwh127: kdislink , 1, .2, -1, , BUFG, kdiwh130: kdisle , 2, .4, -1, , BUFG, kdiwh161: kdifind , 1, .2, -1, , BUFG, kdiwh169: skipscan , 1, .2, -1, , BUFG, kdcwh06: kdccss , 22, 4.4, -1, , BUFG, kdcwh07: kdcifk , 2, .4, -1, , BUFG, kddwh01: kdddel , 31, 6.2, -1, , BUFG, kddwh03: kddlkr , 5, 1, -1, , ENQG, CF - Controlfile Transaction , 2, .4, -1, , ENQG, CI - Cross-Instance Call Invocation , 2, .4, -1, , ENQG, CU - Cursor , 4, .8, -1, , ENQG, HW - Segment High Water Mark , 1, .2, -1, , ENQG, IS - Instance State , 1, .2, -1, , ENQG, JD - Job Queue Date , 1, .2, -1, , ENQG, JS - Job Scheduler , 68, 13.6, -1, , ENQG, MD - Materialized View Log DDL , 1, .2, -1, , ENQG, RO - Multiple Object Reuse , 9, 1.8, -1, , ENQG, TM - DML , 73, 14.6, -1, , ENQG, TS - Temporary Segment , 1, .2, -1, , ENQG, TT - Tablespace , 2, .4, -1, , ENQG, TX - Transaction , 15, 3, -1, , LATG, post/wait queue , 20, 4, -1, , LATG, session allocation , 319, 63.8, -1, , LATG, session idle bit , 744, 148.8, -1, , LATG, client/application info , 136, 27.2, -1, , LATG, object stats modification , 3, .6, -1, , LATG, messages , 114, 22.8, -1, , LATG, enqueues , 243, 48.6, -1, , LATG, enqueue hash chains , 361, 72.2, -1, , LATG, resmgr group change latch , 40, 8, -1, , LATG, channel handle pool latch , 2, .4, -1, , LATG, channel operations parent latch , 73, 14.6, -1, , LATG, message pool operations parent latch , 5, 1, -1, , LATG, active service list , 17, 3.4, -1, , LATG, OS process allocation , 2, .4, -1, , LATG, KMG MMAN ready and startup request latch, 2, .4, -1, , LATG, Memory Management Latch , 2, .4, -1, , LATG, mostly latch-free SCN , 14, 2.8, -1, , LATG, lgwr LWN SCN , 14, 2.8, -1, , LATG, Consistent RBA , 14, 2.8, -1, , LATG, cache buffers lru chain , 1534, 306.8, -1, , LATG, active checkpoint queue latch , 17, 3.4, -1, , LATG, checkpoint queue latch , 266, 53.2, -1, , LATG, cache buffers chains , 28630, 5.73k, -1, , LATG, cache buffer handles , 1, .2, -1, , LATG, multiblock read objects , 24, 4.8, -1, , LATG, cache table scan latch , 16, 3.2, -1, , LATG, simulator lru latch , 613, 122.6, -1, , LATG, simulator hash latch , 617, 123.4, -1, , LATG, object queue header operation , 373, 74.6, -1, , LATG, object queue header heap , 50, 10, -1, , LATG, redo writing , 56, 11.2, -1, , LATG, redo copy , 360, 72, -1, , LATG, redo allocation , 419, 83.8, -1, , LATG, loader state object freelist , 1, .2, -1, , LATG, begin backup scn array , 5, 1, -1, , LATG, dml lock allocation , 146, 29.2, -1, , LATG, commit callback allocation , 4, .8, -1, , LATG, sort extent pool , 1, .2, -1, , LATG, file cache latch , 22, 4.4, -1, , LATG, undo global data , 85, 17, -1, , LATG, ktm global data , 1, .2, -1, , LATG, In memory undo latch , 91, 18.2, -1, , LATG, KTF sga latch , 1225, 245, -1, , LATG, sequence cache , 6, 1.2, -1, , LATG, row cache objects , 9066, 1.81k, -1, , LATG, kks stats , 2, .4, -1, , LATG, shared pool , 2806, 561.2, -1, , LATG, library cache , 17455, 3.49k, -1, , LATG, library cache lock , 1351, 270.2, -1, , LATG, library cache pin , 1733, 346.6, -1, , LATG, library cache pin allocation , 7, 1.4, -1, , LATG, library cache lock allocation , 5, 1, -1, , LATG, library cache load lock , 2, .4, -1, , LATG, shared pool simulator , 725, 145, -1, , LATG, session timer , 2, .4, -1, , LATG, job_queue_processes parameter latch , 1, .2, -1, , LATG, hash table modification latch , 2, .4, -1, , LATG, hash table column usage latch , 4, .8, -1, , LATG, SQL memory manager latch , 2, .4, -1, , LATG, SQL memory manager workarea list latch , 180, 36, -1, , LATG, compile environment latch , 66, 13.2, -1, , LATG, ASM db client latch , 10, 2, -1, , LATG, JS queue state obj latch , 66, 13.2, -1, , LATG, PL/SQL warning settings , 14, 2.8, 128, SYS , STAT, opened cursors cumulative , 565, 113, 128, SYS , STAT, opened cursors current , -4, -.8, 128, SYS , STAT, user commits , 1, .2, 128, SYS , STAT, user calls , 2, .4, 128, SYS , STAT, recursive calls , 2991, 598.2, 128, SYS , STAT, recursive cpu usage , 18, 3.6, 128, SYS , STAT, session logical reads , 17075, 3.42k, 128, SYS , STAT, CPU used when call started , 269, 53.8, 128, SYS , STAT, CPU used by this session , 29, 5.8, 128, SYS , STAT, DB time , 627, 125.4, 128, SYS , STAT, user I/O wait time , 12, 2.4, 128, SYS , STAT, session uga memory , -1315136, -263.03k, 128, SYS , STAT, messages sent , 6, 1.2, 128, SYS , STAT, session pga memory , -7471104, -1.49M, 128, SYS , STAT, enqueue waits , 1, .2, 128, SYS , STAT, enqueue requests , 62, 12.4, 128, SYS , STAT, enqueue conversions , 3, .6, 128, SYS , STAT, enqueue releases , 66, 13.2, 128, SYS , STAT, physical read total IO requests , 44, 8.8, 128, SYS , STAT, physical read total multi block requests, 8, 1.6, 128, SYS , STAT, physical read total bytes , 942080, 188.42k, 128, SYS , STAT, physical write total IO requests , 4, .8, 128, SYS , STAT, physical write total multi block request, 4, .8, 128, SYS , STAT, physical write total bytes , 950272, 190.05k, 128, SYS , STAT, db block gets , 644, 128.8, 128, SYS , STAT, db block gets from cache , 540, 108, 128, SYS , STAT, db block gets direct , 104, 20.8, 128, SYS , STAT, consistent gets , 16409, 3.28k, 128, SYS , STAT, consistent gets from cache , 16409, 3.28k, 128, SYS , STAT, consistent gets - examination , 6316, 1.26k, 128, SYS , STAT, physical reads , 115, 23, 128, SYS , STAT, physical reads cache , 115, 23, 128, SYS , STAT, physical read IO requests , 44, 8.8, 128, SYS , STAT, physical read bytes , 942080, 188.42k, 128, SYS , STAT, db block changes , 618, 123.6, 128, SYS , STAT, consistent changes , 7, 1.4, 128, SYS , STAT, physical writes , 116, 23.2, 128, SYS , STAT, physical writes direct , 116, 23.2, 128, SYS , STAT, physical write IO requests , 4, .8, 128, SYS , STAT, physical write bytes , 950272, 190.05k, 128, SYS , STAT, physical writes non checkpoint , 116, 23.2, 128, SYS , STAT, change write time , 3, .6, 128, SYS , STAT, redo synch writes , 3, .6, 128, SYS , STAT, redo synch time , 3, .6, 128, SYS , STAT, free buffer requested , 132, 26.4, 128, SYS , STAT, dirty buffers inspected , 2, .4, 128, SYS , STAT, hot buffers moved to head of LRU , 420, 84, 128, SYS , STAT, free buffer inspected , 187, 37.4, 128, SYS , STAT, commit cleanout failures: callback failu, 1, .2, 128, SYS , STAT, commit cleanouts , 48, 9.6, 128, SYS , STAT, commit cleanouts successfully completed , 47, 9.4, 128, SYS , STAT, CR blocks created , 7, 1.4, 128, SYS , STAT, switch current to new buffer , 3, .6, 128, SYS , STAT, physical reads cache prefetch , 71, 14.2, 128, SYS , STAT, shared hash latch upgrades - no wait , 29, 5.8, 128, SYS , STAT, calls to kcmgcs , 21, 4.2, 128, SYS , STAT, calls to kcmgas , 29, 5.8, 128, SYS , STAT, calls to get snapshot scn: kcmgss , 4114, 822.8, 128, SYS , STAT, redo entries , 348, 69.6, 128, SYS , STAT, redo size , 105284, 21.06k, 128, SYS , STAT, redo ordering marks , 7, 1.4, 128, SYS , STAT, redo subscn max counts , 13, 2.6, 128, SYS , STAT, undo change vector size , 40376, 8.08k, 128, SYS , STAT, data blocks consistent reads - undo reco, 7, 1.4, 128, SYS , STAT, no work - consistent read gets , 9936, 1.99k, 128, SYS , STAT, cleanouts only - consistent read gets , 1, .2, 128, SYS , STAT, rollbacks only - consistent read gets , 7, 1.4, 128, SYS , STAT, immediate (CURRENT) block cleanout appli, 22, 4.4, 128, SYS , STAT, immediate (CR) block cleanout applicatio, 1, .2, 128, SYS , STAT, deferred (CURRENT) block cleanout applic, 20, 4, 128, SYS , STAT, commit txn count during cleanout , 3, .6, 128, SYS , STAT, active txn count during cleanout , 4, .8, 128, SYS , STAT, cleanout - number of ktugct calls , 5, 1, 128, SYS , STAT, table scans (short tables) , 22, 4.4, 128, SYS , STAT, table scan rows gotten , 16208, 3.24k, 128, SYS , STAT, table scan blocks gotten , 344, 68.8, 128, SYS , STAT, table fetch by rowid , 5476, 1.1k, 128, SYS , STAT, table fetch continued row , 5, 1, 128, SYS , STAT, cluster key scans , 32, 6.4, 128, SYS , STAT, cluster key scan block gets , 45, 9, 128, SYS , STAT, rows fetched via callback , 4083, 816.6, 128, SYS , STAT, index crx upgrade (positioned) , 23, 4.6, 128, SYS , STAT, leaf node splits , 2, .4, 128, SYS , STAT, leaf node 90-10 splits , 1, .2, 128, SYS , STAT, index fetch by key , 1457, 291.4, 128, SYS , STAT, index scans kdiixs1 , 4028, 805.6, 128, SYS , STAT, session cursor cache hits , 430, 86, 128, SYS , STAT, cursor authentications , 45, 9, 128, SYS , STAT, buffer is pinned count , 9217, 1.84k, 128, SYS , STAT, buffer is not pinned count , 14010, 2.8k, 128, SYS , STAT, workarea memory allocated , -1776, -355.2, 128, SYS , STAT, workarea executions - optimal , 2, .4, 128, SYS , STAT, parse time cpu , 1, .2, 128, SYS , STAT, parse count (total) , 432, 86.4, 128, SYS , STAT, parse count (hard) , 5, 1, 128, SYS , STAT, execute count , 824, 164.8, 128, SYS , STAT, bytes sent via SQL*Net to client , 202, 40.4, 128, SYS , STAT, bytes received via SQL*Net from client , 139, 27.8, 128, SYS , STAT, SQL*Net roundtrips to/from client , 2, .4, 128, SYS , STAT, sorts (memory) , 33, 6.6, 128, SYS , TIME, hard parse (bind mismatch) elapsed time , 1951, 390.2us, .0%, | | 128, SYS , TIME, hard parse elapsed time , 2787, 557.4us, .1%, | | 128, SYS , TIME, repeated bind elapsed time , 394, 78.8us, .0%, | | 128, SYS , TIME, parse time elapsed , 8932, 1.79ms, .2%, | | 128, SYS , TIME, PL/SQL execution elapsed time , 57864, 11.57ms, 1.2%, |@ | 128, SYS , TIME, DB CPU , 600031, 120.01ms, 12.0%, |@@ | 128, SYS , TIME, sql execute elapsed time , 1132905, 226.58ms, 22.7%, |@@@ | 128, SYS , TIME, hard parse (sharing criteria) elapsed ti, 1950, 390us, .0%, | | 128, SYS , TIME, DB time , 1169539, 233.91ms, 23.4%, |@@@ | 128, SYS , WAIT, enq: RO - fast object reuse , 924, 184.8us, .0%, | | 128, SYS , WAIT, log file sync , 30373, 6.07ms, .6%, |@ | 128, SYS , WAIT, db file sequential read , 84665, 16.93ms, 1.7%, |@ | 128, SYS , WAIT, db file scattered read , 28306, 5.66ms, .6%, |@ | 128, SYS , WAIT, direct path write , 538, 107.6us, .0%, | | 128, SYS , WAIT, SQL*Net message to client , 6, 1.2us, .0%, | | 128, SYS , WAIT, SQL*Net message from client , 4005179, 801.04ms, 80.1%, |@@@@@@@@ | 128, SYS , WAIT, events in waitclass Other , 1358, 271.6us, .0%, | | -- End of snap 1, end=2009-11-20 00:51:43, seconds=5 PL/SQL procedure successfully completed.
So, the function names (kernel code locations) reported by Snapper do give away how many LIOs were done for accessing user data (kd* functions – which is the Kernel Data layer), space management (ktsI* – which is Kernel Transaction Space management module), kte* – which is the Kernel Transaction Extent management layer etc.
Seeing the function names has allowed me to diagnose some nasty ASSM bugs causing huge amounts of logical IOs in past, like ktspscan_bmb which is an ASSM bitmap block space search function. More details follow some time in the future ;-)
Explain Plan For command may show you the wrong execution plan – Part 1
In Oracle-L mailing list a question was asked about under which conditions can the explain plan report a wrong execution plan (not the one which was actually used when a problem happened).
I replied this, but thought to show an example test case of this problem too:
1) The optimizer statistics the EXPLAIN PLAN ends up using are different
from the statistics the other session ended up using
2) Explain plan does not use bind variable peeking thus will not optimize
for current bind variable values
3) Explain plan treats all bind variables as VARCHAR2, thus you ma have
implicit datatype conversion happening during the plan execution, (meaning
to_char,to_number functions are added around variables/columns) and this for
example may make optimizer to ignore some indexes if you get unlucky.
…Of course explain plan doesn’t really
execute the plan so the implicit datatype conversion you see is in the
explained plan only, but if you actually execute the statement (with correct
bind datatypes) then there’s no implicit datatype conversion. And that’s
where the difference comes from…
And here comes an example of condition number 3 above. Lets use a little bit of bad design out there and put numeric values into varchar2 columns:
SQL> create table t (id varchar2(10), name varchar2(100)); Table created. SQL> insert into t select to_char(object_id), object_name from dba_objects; 51449 rows created.
Now we add a little index for lookup performance and gather stats:
SQL> create index i on t(id); Index created. SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true); PL/SQL procedure successfully completed.
Now lets define a bind variable of NUMBER type and set a value for it:
SQL> var x number SQL> SQL> exec :x:=99999 PL/SQL procedure successfully completed.
Now lets use “explain plan for” to estimate the execution plan:
SQL> explain plan for
2 select sum(length(name)) from t where id > :x;
Explained.
SQL> select * from table(dbms_xplan.display) ;
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 3694077449
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 56 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 29 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T | 2572 | 74588 | 56 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | I | 463 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("ID">:X)
15 rows selected.
Explain plan command nicely reports that we’d be using an index range scan, which would be a good thing to do given my test data and search condition.
Now lets actually run the statement and see the REAL execution plan actually used for the execution. I’ll use dbms_xplan.display_CURSOR for this. If you don’t pass SQL_ID/child into that function it will just report the last SQL statement executed in your current session. But the key difference between the dbms_xplan.DISPLAY and DISPLAY_CURSOR is that the latter goes to library cache and fetches the actual SQL plan used from there. The explain plan command just reparses the statement and estimates a plan, ignoring any bind variable values and assuming that all bind variables are of type varchar2:
SQL> select sum(length(name)) from t where id > :x;
SUM(LENGTH(NAME))
-----------------
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID 7zm570j6kj597, child number 0
-------------------------------------
select sum(length(name)) from t where id > :x
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 60 (100)| |
| 1 | SORT AGGREGATE | | 1 | 29 | | |
|* 2 | TABLE ACCESS FULL| T | 2572 | 74588 | 60 (5)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_NUMBER("ID")>:X)
19 rows selected.
Whatta? We actually used a full table scan!
2nd part will follow soon :-)
Detect chained and migrated rows in Oracle – Part 1
I received a question about migrated rows recently.
It was about how to detect migrated rows in a 200TB data warehouse, with huge tables – as the ANALYZE TABLE xyz LIST CHAINED ROWS INTO command can not be automatically parallelized at table level (as DBMS_STATS can be, but oh, DBMS_STATS doesn’t gather the migrated/chained row info). Therefore the analyze command would pretty much run forever before returning (and committing) the chained row info in the output table. Also as there are regular maintenance jobs running on these tables (I suspect partition maintentance for example), then it wouldn’t be nice to keep running ANALYZE on the whole table constantly.
So, is there any faster or better way for finding the amount of migrated rows?
Ihave two answers to this.
Answer 1:
As we are dealing with a huge 200+ TB data warehouse its tables/indexes are most likely partitioned. Thus you could use the ANALYZE TABLE xyz PARTITION (abc) LIST CHAINED ROWS command to analyze individual partitions, even in parallel (sqlplus sessions) if you like. This would allow you to focus only on the partitions of interest (the latest ones, with the heaviest activity perhaps).
SQL> create table CHAINED_ROWS ( 2 owner_name varchar2(30), 3 table_name varchar2(30), 4 cluster_name varchar2(30), 5 partition_name varchar2(30), 6 subpartition_name varchar2(30), 7 head_rowid rowid, -- actual chained row's head piece address in the segment 8 analyze_timestamp date 9 );
Table created.
SQL> SQL> analyze table tmp partition (sys_p501) list chained rows; -- the default table name used for output is "CHAINED_ROWS"
Table analyzed.
SQL> analyze table tmp partition (sys_p502) list chained rows;
Table analyzed.
SQL> select partition_name, count(*) from chained_rows group by partition_name;
PARTITION_NAME COUNT(*) ------------------------------ ---------- SYS_P502 252 SYS_P501 5602 SQL>
So, from above you see its possible to find out partition (or even sub-partition level row chaining).
However this above command lists you both CHAINED rows and MIGRATED rows (even though Oracle calls them all chained rows internally, as the chaining mechanism is the same for both cases).
Chained row is a row which is too large to fit into a block, so will always have to be split between multiple different blocks – with an exception of intra-block chaining which is used for rows with more than 255 columns. Migrated row on the other hand is a row which has been updated larger than it initially was – and if as a result it doesn’t fit into its original block, the row itself is moved to a new block, but the header (kind of a stub pointer) of the row remains in original location. This is needed so that any indexes on the table would still be able to find that row using original ROWIDs stored in them). If Oracle didn’t leave the row head piece in place then it would always go and update all indexes which have the ROWID of the migrating row in them.
Why should we care whether a row is a real chained row or just a migrated row?
It’s because if the row is chained, then any reorgs would not help you – if a row is too big to fit into a block, its too big to fit into a block no matter how many times you move around the table. (Note that if you have large tables full of rows longer than 8KB there’s likely something wrong with your design).
But migrated rows on the other hand are “chained” into another block due some update which made them not fit into existing block anymore. This happens when PCTFREE is set too low compared to real row growth factor and sometimes you may want to fix it by reorganizing the table/partition with ALTER TABLE/PARTITION MOVE or by backing the rows up, deleting them and reinserting them back to the table (that one makes sense when only a small amount of rows in a table are migrated).
If you are completely sure that you don’t have any rows longer than the free space in an empty block (thus all individual rows would fit into a block and would need to be split among multiple blocks) then you can conclude that all the rows reported were migrated due their growth.
Another option would be to query out all or a sample of these chained/migrated rows and actually measure how long they are if all columns are put together. This could be done using vsize() function (or also dump() and lengthb() in some cases). Of course the column and row header overhead would need to be accounted in as well.
So, this already gets pretty complex and there are more tiny details which we should take into account… thus I will introduce another way to look into the row migration/chaining thing:
Answer 2: (Alternatively called “should we care?”)
This answer looks into the problem from another angle (a more systematic angle by the way). And due my time constraints I will write this in 2 parts, here I will write the concepts and will follow up with a “packaged” use case some time later.
Instead of trying to figure out how many (or what percentage of) rows are migrated in the table, we rather try to measure how much extra work these migrated rows cause to our queries and workload. This is important because we don’t want to start reorganizing anything just because there are migrated rows in a table – maybe these rows don’t cause any significant trouble at all.
Migrated rows are not too bad when noone is accessing them. All that happens is that you use some extra disk space due extra row head pieces left behind in original blocks.
But the effect can be visible when migrated rows are accessed a lot via index access paths. What happens is that (let say you’re doing an index range scan) you find the ROWID of a row matching your search (whichs costs you some logical and maybe physical IOs on the index blocks), then you go to the table block corresponding to the ROWID (which means one more LIO and perhaps PIO) and oops the row is not there anymore! All you find is a row head piece which has another ROWID init, pointing to the actual location of the row. Here you go, you need one more LIO (and maybe PIO) to access that row.
There’s a statistic called “table fetch by rowid” which says how many times Oracle took a ROWID (for example from an index) and went to a table to lookup the actual row. This is a normal statistic which you’ll always see when there’s indexed table access going on.
Luckily Oracle has another statistic called “table fetch continued row” which tells you when we didn’t find all that we wanted from the original row piece and had to follow a pointer to the new location of the migrated row (or next row piece of a chained row).
And no, I’m not going to propose a “good” ratio between these two statistics! :)
I will run a query against the table of interest, measure it with Snapper and do some rough calculation instead.
First I will force the query to do a full segment/partition scan on the dataset (or I could use SAMPLE scan accessing part of the data only). The actual query isn’t even important as long as its doing a full segment scan on the table/partition.
The snapper output for the session doing the full table scan is following:
SQL> @sn 5 133 -- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com )
-- ---------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ---------------------------------------------------------------------------------------------------------------------- 133, TANEL , STAT, opened cursors cumulative , 1, .2, 133, TANEL , STAT, user calls , 3, .6, 133, TANEL , STAT, session logical reads , 158, 31.6, 133, TANEL , STAT, CPU used when call started , 1, .2, 133, TANEL , STAT, CPU used by this session , 1, .2, 133, TANEL , STAT, DB time , 2, .4, 133, TANEL , STAT, consistent gets , 158, 31.6, 133, TANEL , STAT, consistent gets from cache , 158, 31.6, 133, TANEL , STAT, calls to get snapshot scn: kcmgss , 9, 1.8, 133, TANEL , STAT, no work - consistent read gets , 150, 30, 133, TANEL , STAT, table scans (short tables) , 1, .2, 133, TANEL , STAT, table scan rows gotten , 19539, 3.91k, 133, TANEL , STAT, table scan blocks gotten , 150, 30, 133, TANEL , STAT, session cursor cache hits , 1, .2, 133, TANEL , STAT, session cursor cache count , 1, .2, 133, TANEL , STAT, parse count (total) , 1, .2, 133, TANEL , STAT, execute count , 1, .2, 133, TANEL , STAT, bytes sent via SQL*Net to client , 342, 68.4, 133, TANEL , STAT, bytes received via SQL*Net from client , 175, 35, 133, TANEL , STAT, SQL*Net roundtrips to/from client , 2, .4, 133, TANEL , TIME, parse time elapsed , 33, 6.6us, .0%, | | 133, TANEL , TIME, DB CPU , 10052, 2.01ms, .2%, | | 133, TANEL , TIME, sql execute elapsed time , 13354, 2.67ms, .3%, | | 133, TANEL , TIME, DB time , 13503, 2.7ms, .3%, | | 133, TANEL , WAIT, SQL*Net message to client , 5, 1us, .0%, | | 133, TANEL , WAIT, SQL*Net message from client , 8448078, 1.69s, 169.0%, |@@@@@@@@@@| -- End of snap 1, end=2009-11-04 22:17:06, seconds=5 PL/SQL procedure successfully completed.
From above you see my table scan operation processed 19539 rows (the table scan rows gotten statistic shows how many rows were fetched from table segment during the scan) and we don’t see any table fetch continued row statistic reported (thus it didn’t increase!)
And this is something important to note. When you do a full table/partition scan then Oracle will silently skip the head (pointer) row pieces of migrated rows as it “knows” that the actual row is going to be somewhere towards the end of the segment (or has already been processed earlier in the segment). Thus, when you have only migrated rows, then full table scan won’t increment table fetch continued row statistic when it hits them. Therefore, if you see some value for table fetch continued row when full scanning through a table/partition, this count is the number of chained row pieces Oracle had to go through in order to find the individual pieces of the rows. If the statistic shows 100, it may mean that there were 100 rows chained into 2 pieces (head and one continuing piece). Or it could mean that there were let say 50 rows which were so large that they needed 3 pieces in total (head and 2 continuation pieces per row – so 50 rows x 2 pieces = 100 table fetch continued row operations). Again it doesn’t really matter how many rows exactly were chained, what matters more is how much extra work this chaining causes us – which is up 100 extra data block accesses (LIOs + potentially PIOs too).
So, when running full table/partition scan (or full scan with SAMPLE parameter to get only a sample of data) you can easily monitor the effect of chained rows by looking into table fetch continued row statistic. As (huge) chained rows are rather a data model design/physical design issue, I stop discussing these here.
So, how to detect the impact of migrated rows then?
As I mentioned above, migrated rows do not cause huge overhead when full scanning through segments as Oracle can ignore the head rowpieces in the row’s old location.
Lets run the query with an index range scan instead (again I just modified my query against the table to use an index using a hint + query predicate). Again I measured what the query was doing using Snapper:
SQL> @snapper out 5 1 133 -- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com ) -- ---------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ---------------------------------------------------------------------------------------------------------------------- 133, TANEL , STAT, opened cursors cumulative , 1, .2, 133, TANEL , STAT, user calls , 3, .6, 133, TANEL , STAT, session logical reads , 121, 24.2, 133, TANEL , STAT, CPU used when call started , 1, .2, 133, TANEL , STAT, CPU used by this session , 1, .2, 133, TANEL , STAT, consistent gets , 121, 24.2, 133, TANEL , STAT, consistent gets from cache , 121, 24.2, 133, TANEL , STAT, consistent gets - examination , 2, .4, 133, TANEL , STAT, calls to get snapshot scn: kcmgss , 1, .2, 133, TANEL , STAT, no work - consistent read gets , 119, 23.8, 133, TANEL , STAT, table fetch by rowid , 999, 199.8, 133, TANEL , STAT, table fetch continued row , 57, 11.4, 133, TANEL , STAT, index scans kdiixs1 , 1, .2, 133, TANEL , STAT, cursor authentications , 1, .2, 133, TANEL , STAT, buffer is pinned count , 1938, 387.6, 133, TANEL , STAT, buffer is not pinned count , 117, 23.4, 133, TANEL , STAT, parse count (total) , 1, .2, 133, TANEL , STAT, execute count , 1, .2, 133, TANEL , STAT, bytes sent via SQL*Net to client , 342, 68.4, 133, TANEL , STAT, bytes received via SQL*Net from client , 175, 35, 133, TANEL , STAT, SQL*Net roundtrips to/from client , 2, .4, 133, TANEL , TIME, parse time elapsed , 93, 18.6us, .0%, | | 133, TANEL , TIME, DB CPU , 1918, 383.6us, .0%, | | 133, TANEL , TIME, sql execute elapsed time , 1696, 339.2us, .0%, | | 133, TANEL , TIME, DB time , 1918, 383.6us, .0%, | | 133, TANEL , WAIT, SQL*Net message to client , 5, 1us, .0%, | | 133, TANEL , WAIT, SQL*Net message from client , 4213259, 842.65ms, 84.3%, |@@@@@@@@@ | -- End of snap 1, end=2009-11-04 22:20:09, seconds=5
Check the bold stats above:
- index scans kdiixs1 – This counter shows how many times we did an index range scan during the snapper run. I ran only one query once (which didn’t have any nested loop joins nor other loops in execution plan which may invoke range scans many times during a single query)
- table fetch by rowid – During this index range scan we see that we have found a ROWID from an index and gone to look up the corresponding table row 999 times
- table fetch continued row – We didnt find the row we wanted from its original location 57 times out of 999 attempts! This is the important part. We have had to do extra work for 5.7% of the rows accessed. This extra work may mean extra logical IOs and extra logical IOs may mean extra physical IOs (and extra physical IOs mean extra wait time!).You see where I’m going here? By knowing for how many table rows we had to go and look for next rowpiece and by knowing how many LIOs, PIOs and PIO time we spent during the test run, we can (very) roughly calculate how much PIO time and how many LIOs we could save if these rows weren’t migrated, if they were accessible right where the index ROWIDs point to.
I have to admit – there are many factors which influence how many LIOs and PIOs we would actually save. Caching effect is the first one, obviously when a block is in buffer cache, you avoid the associated physical IO when accessing the block again. Also, Oracle can cache buffer handles (and keep cache buffers pinned during the database call), so you could even avoid the cost of a logical IO if you revisit a buffer and get lucky (this is what the buffer is pinned count statistic shows).
So, its not possible to calculate exactly how many LIOs or PIOs you would save, but nevertheless, this table fetch continued row statistic (when compared to table fetch by rowid statistic) does give a good indication whether extra work due row migration is done or not and roughly how much.
So, before even analyzing your tables to see how many migrated/chained rows the tables have in them, I would look into the table fetch continued row/table fetch by rowid statistics in v$sesstat to estimate whether row migration is causing any overhead. You could even start by looking into these stats system wide, in v$sysstat, Statspack/AWR reports but I don’t like this approach as it loses the context of what kind of work did generate which statistic profile if you have diverse workloads.
Phew… I thought this article would be shorter… Here I showed the concept and idea, I do have some scripts which I’ve used in past plus there are some gotchas (I’m sure they’ll come up in questions) which I will discuss in a future post (which may come in few weeks time as I’m veeeery busy right now).
By the way, I hope this article also shows the value of Oracle’s V$SESSTAT performance counters, not everything can be measured using Wait interface, ASH or SQL Trace! In fact this article is a prelude to a series of articles I’ll be writing with James Morle so check out his blog soon!
What’s a good way to learn some Oracle internals every day?
Sometimes when an attendee describes me some totally weird problem during a seminar, I am immediately able to answer something like “Hey this looks like a bug related to this Oracle configuration and can be influenced by xyz”.
And then people ask me “How the hell do you know all this stuff?”
Well, I haven’t been bitten by all of these bugs myself, but I have been doing something for many years, almost every day… reading my email!
Oh, and additionally I have configured Metalink to send me daily updates about new/updated notes, forum articles and… bug descriptions!
The last part is very important. Bug descriptions tell you something about new bugs found (and old bugs rediscovered) and sometimes their details tell you an interesting piece or two about Oracle internals related to them.
And if these descriptions come to your mailbox every day, you catch a detail or two every day. Of course this assumes you are interested enough learning this stuff and take the time to actually open and read the bug descriptions you see interesting enough.
And if you do this for months or years, you will slowly start putting all these details together in your brain (without even noticing it yourself). Things will start to make sense to you over time, thanks to learning a little detail here and there every day. That way you also learn from other people’s experiences and when this bug (or something similar) happens to you, you can recognize it more easily.
I’m not saying here that all you need to do is read Metalink, but keeping an eye of the current bugs + their internal explanations by support people provides icing on the cake. Note that the bug descriptions often contain interesting information which you don’t see documented elsewhere, such some function names and their meanings, undocumented x$ table information and of course new undocumented parameters (which I don’t set anywhere, but do further research on).
I’ve had the habit of reading the bug descriptions for so much time so I haven’t even remembered to mention this to anyone if people ask where to learn internals. But Dominic Brooks and Coskan Gundogar have recently written about this so I thought I should share this with my blog audience too!
You can configure this by going to Settings page in Metalink and configuring the “Headlines/Hot topics via E-Mail” setting.
SystemTap is production supported in Redhat EL5.4
If you don’t know what SystemTap is – it’s the Linux world’s attempt to build Solaris DTrace style safe dynamic instrumentation into Linux kernel.
I’m not going into religious discussions which one is better here, I have used both SystemTap and DTrace successfully for diagnosing low level issues inside OS kernel, so both are good enough for me :)
The problem with SystemTap though has been that it’s not production quality, it’s rather been a technology preview.
But with RHEL 5.4 part of it has changed, Redhat says following in this article:
SystemTap is no longer a technology preview, and now has production support. Red Hat recommendeds that users run scripts on development machines before deployment in production environments. Since SystemTap is an optional diagnostic tool, users can easily stop using it in the event of a problem. Options such as -g for Guru mode, and -D* allow users to disable several security checks. Scripts using these options may not be supported.
Red Hat plans to fix problems in SystemTap, or the Linux kernel, as they arise in connection with new scripts. In some cases, a fix may include extending the blacklist for known areas of the Linux kernel that are unsafe to probe. All scripts that use probes targeting blacklisted areas will need to be revised.
SystemTap users are advised to upgrade to this version.
So, at least officially, SystemTap is now supported by Redhat. Well at least as long as you’re not doing crazy stuff with the -g option, the geek mode ;-)
I would still be very careful before using any SystemTap in production, in fact, in critial environments I wouldn’t run it at all, unless there is no other option (and you’re prepared to get a crash + kernel panic).
However the significance of this note is that SystemTap is accepted for production by Redhat in principle. So now it’s just matter of time until it gets stable and widely used enough to be as practical and useful as DTrace is on Solaris.
I’m tweeting now…
Although I don’t think I have much useful to say in 160 characters, but we’ll see
You can follow me here:
Or just check out the right hand twitter widget in my blog…
James Morle is blogging!
James Morle is blogging! His stuff is definitely (very much) worth reading, in fact I have learnt a lot from his book Scaling Oracle 8i, which I still recommend as first reading for people who want to understand Oracle, OS and hardware touchpoint. And his book is freely downloadable from his website too :)
Check out his blog: http://jamesmorle.wordpress.com/
Download his book: http://www.scaleabilities.co.uk/
Enjoy! :)
KGL simulator, shared pool simulator and buffer cache simulator – what are these?
Note: I accidentially published this note while I was writing it, so you may have seen a partial version of it. It’s complete now.
If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:
SQL> select * from v$sgastat where lower(name) like '%sim%' order by name; POOL NAME BYTES ------------ -------------------------- ---------- shared pool kglsim alloc latch area 1700 shared pool kglsim alloc latches 68 shared pool kglsim count of pinned he 9248 shared pool kglsim free heap list 204 shared pool kglsim free obj list 204 shared pool kglsim hash table 4104 shared pool kglsim hash table bkts 2097152 shared pool kglsim heap 635536 shared pool kglsim latch area 1700 shared pool kglsim latches 68 shared pool kglsim main lru count 87040 shared pool kglsim main lru size 174080 shared pool kglsim object batch 909440 shared pool kglsim pin list arr 816 shared pool kglsim recovery area 2112 shared pool kglsim sga 22092 shared pool kglsim size of pinned mem 18496 shared pool ksim client list 84 shared pool log_simultaneous_copies 480 shared pool sim cache nbufs 640 shared pool sim cache sizes 640 shared pool sim kghx free lists 4 shared pool sim lru segments 640 shared pool sim segment hits 1280 shared pool sim segment num bufs 640 shared pool sim state object 48 shared pool sim trace buf 5140 shared pool sim trace buf context 120 shared pool sim_knlasg 1200 shared pool simulator hash buckets 16512 shared pool simulator latch/bucket st 4608 31 rows selected.
See, a bunch of “kgl sim” and then just “sim” allocations.
… or sometimes you can see latch contention on following latches:
SQL> select name from v$latch where name like '%sim%'; NAME ------------------------------------------------------- ksim membership request latch simulator lru latch simulator hash latch sim partition latch shared pool simulator shared pool sim alloc 6 rows selected.
Again, there seems to be some “simulation” work going on in Oracle instance.
So what are these simulators about?
These simulators help Oracle to determine how much time and effort would have been saved if buffer cache or shared pool size was bigger than current. And this info is summarized and externalized in various advisor views such V$SHARED_POOL_ADVICE, V$MTTR_TARGET_ADVICE, V$DB_CACHE_ADVICE and so on. These simulation engines were introduced in Oracle 9i along the mentioned V$ views.
Also this stuff is used by MMAN for making SGA resizing decisions if SGA_TARGET (10g+) or MEMORY_TARGET (11g+) is enabled.
I decided to write an article about simulators as quite a few people have been asking me about it lately. So, I first googled around to see if someone has already written about it already and found one here. It turned out it was written by me, I just didn’t know about it anymore :)
So, I will write an edited version of my original explanation here. Note that I don’t know the exact algorithm of the simulator, just explaining the fundamentals here.
KGL simulator is the shared pool memory allocation tracking and simulation engine.
KGL=Kernel General Library cache manager, as the name says it deals with library objects such cursors, cached stored object definitions (PL/SQL stored procs, table definitions etc).
KGL simulator is used for estimating the benefit of caching if the cache was larger than currently. The general idea is that when flushing out a library cache object, it’s hash value (and few other bits of info) are still kept in the KGL simulator hash table. This stores a history of objects which were in memory in past but got flushed out. That’s the key part: we keep a history of objects flushed out (this is done by selective sampling though, not at every object flush as it would have too much overhead).
When loading a library cache object into library cache (which means that no existing such object is in there already), Oracle goes and checks the KGL simulator hash table to see whether an object with matching hash value is in there. If a matching object is found, that means that the required object had been in cache in past, but flushed out due space pressure. That’s the second important part: we know when we would have been able to (probably) just reuse an object if the shared pool had been bigger.
Using that information of how many library cache object (re)loads could have been been avoided if cache had been bigger (thanks to KGL simulator history) and also thanks to knowing how much time the (current) object reloads took, Oracle can predict how much response time would have been saved instancewide if shared pool was bigger. So, we don’t need to increase shared pool itself to measure the benefit, we simulate a bigger shared pool by just keeping a history of aged out library cache objects hash values in a special array – and that’s why there are various kglsim and simulator allocations seen in V$SGASTAT.
The simulation results are seen from v$shared_pool_advice and are used by other advisors such ADDM.
Apparently Oracle keeps track also about how much memory is pinned in the shared pool (thus can’t be flushed out) over time. This info is probably used also for estimating shared pool resize requirements as pinned memory chunks are not usable for anything else (as long as they are pinned) in the shared pool. These measurements are also done on sampling basis.
I did a little test with my lotshparses.sql script (which just runs a lot of hard parses inside a PL/SQL loop) and ran latchprofx to report which latches and why was my session using. Note the “sim” instead of usual % in the latchprofx syntax. I am interested in only these latches which have “sim” in their name in this experiment:
SQL> @latchprofx sid,name,func,hmode 146 sim 100000 -- LatchProfX 1.12 by Tanel Poder ( http://www.tanelpoder.com ) SID NAME FUNC HMODE Held Gets Held % ---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- ------- 146 shared pool simulator kglsim_unpin_simhp exclusive 514 49 .51 146 shared pool simulator kglsim_upd_newhp exclusive 206 198 .21 146 shared pool simulator kglsim_scan_lru: scan exclusive 95 93 .10 146 shared pool simulator kglsim_chg_simhp_free exclusive 32 32 .03 146 shared pool simulator kglsim_chg_simhp_noob exclusive 32 32 .03 146 shared pool sim alloc kglsim_chk_heaplist: alloc exclusive 4 3 .00 146 shared pool sim alloc kglsim_chk_objlist: alloc exclusive 3 2 .00 146 shared pool simulator kglsim_chk_objlist: extend exclusive 1 1 .00 146 shared pool simulator kglsim_scan_lru: rebalance exclusive 1 1 .00
This simulation engine is pretty cool because its the first time someone has tried to measure the time what could be saved by increasing shared pool size. (Response) Time is the ultimate measure of performance, so this simulator measures the right thing. Do you remember what people used before that? Library cache hit ratio! A lot of people back then (and some still today) have been misleaded by measuring the wrong things like hit ratios.
Anyway, while this is a cool feature – it can cause trouble in big systems. This is because the extra CPU usage for simulation and also the potential concurrency issues like simulator latch contention (like the shared pool simulator latch above, these latches are used even on 11g where most library cache latches have been replaced with KGX mutexes for efficiency and avoiding false contention issues). Also, some people are sensitive about the potential large memory allocations the shared pool simulator can take from shared pool (this is controlled by _kglsim_max_percent parameter which defaults to max 5% of total shared pool size). As these allocations are dynamic, then when your application suddenly starts making lots of hard parses due newly released code, causing memory shortage, then the simulator can make things even worse as it can make relatively big allocations for itself too! And as far as I know the simulator arrays don’t shrink back.
Note that I’m talking about big systems here, in most other cases the simulation shouldn’t cause trouble (on recent db versions anyway).
In some databases I have disabled all of the advisors and simulators with setting statistics_level = basic. All features what you see from V$STATISTICS_LEVEL would be disabled when setting statistics_level to basic:
SQL> select statistics_name,activation_level from v$statistics_level; STATISTICS_NAME ACTIVAT ---------------------------------------- ------- Buffer Cache Advice TYPICAL MTTR Advice TYPICAL Timed Statistics TYPICAL Timed OS Statistics ALL Segment Level Statistics TYPICAL PGA Advice TYPICAL Plan Execution Statistics ALL Shared Pool Advice TYPICAL Modification Monitoring TYPICAL Longops Statistics TYPICAL Bind Data Capture TYPICAL Ultrafast Latch Statistics TYPICAL Threshold-based Alerts TYPICAL Global Cache Statistics TYPICAL Active Session History TYPICAL Undo Advisor, Alerts and Fast Ramp up TYPICAL Streams Pool Advice TYPICAL Time Model Events TYPICAL Plan Execution Sampling TYPICAL Automated Maintenance Tasks TYPICAL SQL Monitoring TYPICAL Adaptive Thresholds Enabled TYPICAL V$IOSTAT_* statistics TYPICAL . 23 rows selected.
Sometimes you don’t want to disable absolutely everything, but only the troublemakers. The good news is that there’s an (undocumented) parameter for disabling every feature separately. As always, make sure that you know what you’re doing before changing these and get blessing from Oracle support in order to not get yourself in trouble.
So, for example the library cache memory simulation (shared pool advice) can be disabled by setting _library_cache_advice = false .
I covered shared pool and library cache above. Oracle has more cache advisors, the underlying principle in them is the same. They save some small metadata/info about flushed out objects in a simulator array and just check that array when someone is allocating memory for new object. If matching object’s metadata still happens to be in the simulator array, then Oracle knows that we could have avoided the time and effort loading (generating) that object back into memory if the corresponding cache had been bigger. Of course the algorithm must be more sophisticated than just checking for object’s existence, there are probably object’s flush timestamp and size stored in the simulator arrays as well.
While the library cache simulator array stores library cache objects hash values for looking up historical memory allocations metadata, then the buffer cache advisor stores just data block addresses (DBAs) of blocks previously in cache. When Oracle spots that a block was previously in cache while intiating physical IO for re-reading it back again, it will measure the time it took to complete the current physical IO and add this info into the simulator array. Again, that’s how Oracle can measure how much response time would have been saved if the block would still have been in the buffer cache.
The buffer cache advisor needs some special attention, as I have had most issues with that (on large systems again), mostly with buffer cache’s simulator lru latch contention in systems with lots of CPUs and physical IOs going on. The problem here is that even though you can have hundreds of thousands of cache buffers chains latches in a system, the number of simulator lru latches remains relatively low by default, causing contention.
Oracle provides a documented parameter for disabling the buffer cache simulation – you need to set db_cache_advice = off (or ready).
So, hopefully this sheds some light into how the advisors work and what the heck the simulators simulate in Oracle. Even though I provided a list of parameters which allow you to control this stuff – don’t change any of them unless you actually have a problem which they should fix (and you are sure that these parameters would fix your problem).
Update:
Jonathan Lewis has written a good article about buffer cache advisor details on his website – I recommend to read that one too if you’re interested in this stuff:
http://www.jlcomp.demon.co.uk/Cache_advice.html
KGH: NO ACCESS allocations in V$SGASTAT – buffer cache within shared pool!
Since Oracle 10.2 it’s valid to say that buffer cache can be stored inside shared pool.
Now you may think I’m crazy, but read until the end of the post – no matter how crazy I may sound – I have proof!
Here it is:
Few years ago I started noticing a strange memory allocation in shared pool (in V$SGASTAT), called KGH: NO ACCESS.
SQL> select * from v$sgastat where name = 'KGH: NO ACCESS';
POOL NAME BYTES ------------ -------------------------- ---------- shared pool KGH: NO ACCESS 10513696 SQL>
You see, some 10 MB of memory in shared pool has been allocated for something called KGH: NO ACCESS.
Ok, lets see where this memory resides inside shared pool. We can use x$ksmsp view for that, this view has a line in it for each chunk of memory allocated from it (and also the free chunks), along the reasons (or comments) for what reason these chunks were allocated.
NB! Don’t run this query in production! And the reason is that whenever X$KSMSP is queried, it locks the shared pool (by taking shared pool latches) and walks through the linked lists of memory chunks in it. If your shared pool is large and you have lots of chunks in there, the linked list walking will take long time, thus the latches will be held for long time. And while shared pool latches are held, no-one can even log on as new sessions usually need to allocate some memory for session state (like v$parameter array) from shared pool.
Nevertheless, if you see KGH: NO ACCESS allocations in V$SGASTAT in your test database, you can continue with querying, where exactly (in SGA memory) these allocations reside:
SQL> SELECT ksmchcom,ksmchptr,ksmchsiz,ksmchcls 2 FROM x$ksmsp -- do not query from this view in production!!! 3 WHERE ksmchcom = 'KGH: NO ACCESS' 4 AND rownum = 1;
KSMCHCOM KSMCHPTR KSMCHSIZ KSMCHCLS ---------------- ---------------- ---------- -------- KGH: NO ACCESS 00000000BCFF7FE0 32800 no acce SQL>
I limited my search for only one chunk in this case. And that chunk lives in address BCFF7FE0 and is 32800 bytes in size.
So, now I know where one of the KGH: NO ACCESS chunks is residing. Out of interest, let’s query if there are any buffer cache buffers placed inside that chunk of 32800 bytes. X$BH.BA column nicely tells us the physical address of a buffer cache buffer in SGA:
SQL> SELECT
2 dbarfil, dbablk, obj, ba, state --<-- dbarfil = relative file number, dbablk = block number, obj = data object id
3 FROM
4 x$bh --<-- X$BH lists all initialized buffers in buffer cache
5 WHERE
6 to_number(rawtohex(ba),'XXXXXXXXXXXXXXXX') --<-- BA is the "buffer address", actual buffer location in SGA memory
7 BETWEEN
8 to_number('00000000BCFF7FE0','XXXXXXXXXXXXXXXX')
9 AND to_number('00000000BCFF7FE0','XXXXXXXXXXXXXXXX') + 32800 - 1
10 /
DBARFIL DBABLK OBJ BA STATE
---------- ---------- ---------- ---------------- ----------
4 34936 81252 00000000BCFFE000 1
3 46222 4294967295 00000000BCFF8000 1
3 34676 4294967295 00000000BCFFA000 0
2 7074 5938 00000000BCFFC000 1
Wow! This query reports that indeed there are 4 buffer cache buffers which physically reside in the KGH: NO ACCESS shared pool chunk reported by x$ksmsp!
As the OBJ column has the data_object_id of the object in it, lets see what some of these objects are!
SQL> select owner, object_name, object_type 2 from dba_objects 3 where data_object_id = 81252;
OWNER OBJECT_NAME OBJECT_TYPE ------------------ ----------------------- ------------------- TANEL SERVER_STATS_SUM TABLE
SQL> select owner, object_name, object_type 2 from dba_objects 3 where data_object_id = 5938;
OWNER OBJECT_NAME OBJECT_TYPE ------------------ ----------------------- ------------------- SYS WRH$_SQL_PLAN TABLE
So, both above tables have a block residing physically in shared pool heap :)
Note that there’s strangely large data_object_id in two other blocks, which don’t return anything from dba_objects:
SQL> select owner, object_name, object_type 2 from dba_objects 3 where data_object_id = 4294967295; no rows selected
These are blocks belonging to undo segments. I can take the DBARFIL returned from X$BH and match it to relative_fno in dba_data_files to find out into which tablespace these blocks belong:
SQL> select tablespace_name 2 from dba_data_files 3 where relative_fno = 3;
TABLESPACE_NAME ------------------------------ UNDOTBS1
So, Oracle CAN store buffer cache buffers inside shared pool nowadays (since 10.2).
Now the next question is – why on the earth would Oracle want such complexity?
The answer is ASMM – Automatic Shared Memory Management, the manageability thing controlled with SGA_TARGET parameter and the need to increase buffer cache at the expense of shared pool.
I’m pasting some reasoning from an old Oracle-L post of mine here (posted 3 years ago… huh time flies fast, I planned to write an article to my website about it right after that Oracle-L posting back then :)
Note that the query I posted at that Oracle-L article also queries X$KSMSP – so you should not run it in production unless you want to cause performance trouble or hang in there!
When MMAN tries to get rid of a shared pool granule it obviously can't just flush and throw away all the object in it. As long as anybody references chunks in this granule, it cannot be completely deallocated. Oracle has faced a decision what to do in this case: 1) wait until all chunks aren't in use anymore - this might never happen 2) suspend the instance, relocate chunks somewhere else and update all SGA/PGA/UGA/CGA structures for all processes accordingly - this would get very complex 3) flush as many chunks from this shared pool granule as possible, mark them as "KGH: NO ACCESS" that nobody else would touch them, mark corresponding entry to DEFERRED in V$SGA_RESIZE_OPS and notify buffer cache manager about the new memory locations being available for use. Oracle has gone with option 3 as option 1 wouldn't satisfy us and 2 would be very complex to implement, and it would mean a complete instance hang for seconds to minutes. So, Oracle can share a granule between shared pool and buffer cache data. This sounds like a mess, but there is not really a better way to do it (if leaving the question, why the heck do you want to continuously reduce your shared pool size anyway, out).
Here you go, here’s the proof that Oracle does store buffer cache buffers inside shared pool if its needs to do so :)
By the way, nowadays Oracle can store redo log buffers and undo data inside shared pool as well, but more about it in a future blog entry (come back in 3 years ;-)
Oracle 11gR2 has been released – and with column oriented storage option
You may already have noticed that Oracle 11gR2 for Linux is available for download on Oracle.com website, with documentation.
And this document ends speculation about whether Oracle 11.2 will support column-oriented storage – yes it will:
However, this is apparently available on Exadata storage only as a new error message below indicates:
ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage
Cause: An attempt was made to use hybrid columnar compression on unsupported storage.
Action: Create this table in a tablespace residing on Exadata storage or use a different compression type.
Update: Kevin Closson mentioned that 11gR2 doesn’t really have column oriented storage as some other products like Vertica’s and Sybase IQ use, but its rather just column oriented compression option where storage is still organized by row but individual fields in these rows use compression dictionaries whichcan span multiple block boundaries (we’ll thats my interpretation at least).
The 11gR2 release overview doc seems to be wrong in this case, as it says:
Hybrid columnar compression is a new method for organizing how data is stored. Instead of
storing the data in traditional rows, the data is grouped, ordered and stored one column at a time.
Read Kevin’s note here:
- ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage
- Cause: An attempt was made to use hybrid columnar compression on unsupported storage.
- Action: Create this table in a tablespace residing on Exadata storage or use a different compression type.
User Comments
- ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage
- Cause: An attempt was made to use hybrid columnar compression on unsupported storage.
- Action: Create this table in a tablespace residing on Exadata storage or use a different compression type.
User Comments
latch: cache buffers chains latch contention – a better way for finding the hot block
Here’s a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer cache. If the block happens to be in cache, then in most cases it has to be pinned first before use and unpinned after use, to make sure no-one else can perform an incompatible operation on that block at the same time. The modification of pin structures (pinning/unpinning) is also protected by CBC latches.
CBC latch contention can happen for multiple reasons, but one reason is that there is some really hot block in a SMP system with high number of CPUs (or CMT system with high number of threads like Sun T-series servers). Sometimes there happen to be multiple moderately hot blocks “under” the same CBC latch, which can result in latch contention again.
Traditionally DBAs used to look up the child latch address from V$SESSION_WAIT, sql_trace output or ASH and then look up all buffers protected by that latch from X$BH, using HLADDR column (HLADDR stands for Hash Latch Address). I also have a script for that, bhla.sql (Buffer Headers by Latch Address), which reports me all blocks currently in buffer cache, “under” that particular latch and the corresponding data block addresses and object names:
SQL> @bhla 27E5A780 <-- latch address reported by wait interface (Parameter1 in latch: cache buffers chains wait event) FLG_LRUFLG OBJ OBJECT_TYPE object TCH DBA ------------------- ---------- ------------------- ---------------------------------------- ---------- --------- 80000:8 2 TABLE SYS.SUBCOLTYPE$ 0 1 7873 80000:8 2 TABLE SYS.ATTRCOL$ 0 1 7873 0:0 50472 INDEX SYSMAN.MGMT_JOB_EXEC_IDX04 1 3 27699 0:0 50472 INDEX SYSMAN.MGMT_JOB_EXEC_IDX04 1 3 27699 0:4 50472 INDEX SYSMAN.MGMT_JOB_EXEC_IDX04 1 3 27699 2202000:8 3710 TABLE SYS.WRI$_ADV_REC_ACTIONS 1 3 972 0:0 50472 INDEX SYSMAN.MGMT_JOB_EXEC_IDX04 1 3 27699 0:0 50472 INDEX SYSMAN.MGMT_JOB_EXEC_IDX04 1 3 27699 2202000:8 94227 TABLE PARTITION SYS.WRH$_ROWCACHE_SUMMARY 2 3 36142 80000:8 54880 INDEX PERFSTAT.STATS$UNDOSTAT_PK 14 4 118331 0:8 37 INDEX SYS.I_OBJ2 24 1 55591 2000:8 3680 TABLE SYS.WRI$_ADV_TASKS 26 3 739 0:8 75 TABLE SYS.IDL_UB2$ 35 1 11745 0:8 73 TABLE SYS.IDL_UB1$ 38 1 12211
From above we see that this child latch is protecting many blocks in buffer cache. So which one is the troublemaker? This is where the touchcount (TCH column) has been traditionally used, the higher the TCH the hotter the buffer, right?
Actually this may not always be true. For example if there is a somewhat hot block which has been steadily in buffer cache for weeks, it may have very high touch count. Now if there is an extremely hot block used by every morning’s batch job (and causing the trouble) but not used later on in the day, this block will get aged out later in the day. Once a block is aged out, its metadata, including TCH is gone! So when the block is loaded back in next morning (causing trouble again), its touch count starts from zero again.
So, by just looking into touchcount numbers you can’t always reliably detect who’s the current troublemaker. One option here would be to query the touchcounts, then wait for a minute, query the touchcounts again and see who’s touchcount increased the most during the minute. This would give a bit better picture about who is causing trouble now.
But still, it would not be always reliable for another reason – touchcounts are incremented only after 3 seconds have passed since last increment! This factor has been coded in to avoid situation such a short but crazy nested loop join hammering a single buffer hundreds of thousands of times in few seconds and then finishing. The buffer wouldn’t be hot anymore but the touchcount would be hundreds of thousands due a single SQL execution. So, unless 3 seconds (of SGA internal time) has passed since last TCH update, the touchcounts would not be increased during buffer access.
This time is controlled by SGA variable kcbatt_ by the way:
SQL> oradebug dumpvar sga kcbatt_ ub4 kcbatt_ [3C440F4, 3C440F8) = 00000003
This 3-second delay leaves us in the following situation, let say there are 2 blocks protected by a CBC child latch:
One block has been accessed once every 3 seconds for 24 hours in a row. A block accessed once per 3 seconds is definitely not a hot block, but its touchcount would be around 28800 (86400 seconds per 24 hours / 3 = 28800).
And there is another block which is accessed crazily for 2 seconds in a row and this happens every 10 seconds. 2 seconds of consecutive access would increase the touchcount by 1. If such access pattern has been going on every 10 seconds over last 24 hours, then the touch count for that buffer would be 86400 / 10 = 8640.
In the first case we can have a very cold block with TCH = 28800 and in second case a very hot block with TCH = 8640 only and this can mislead DBAs to fixing the wrong problem.
Luckily this problem can be solved with LatchProfX :-)
If you don’t know what LatchProfX is, you may want to read these articles first – LatchProf gives you a better way for latch contention troubleshooting compared to old approaches:
- http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/
- http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/
- http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/
And now I introduce a new feature in LatchProfX.
In addition to being able to report the SID of the latch holder and the kernel function why this latch was taken by the session, the newest LatchProfX can also report the object what was protected by the latch under investigation. For cache buffers chains the object is data block address of the block being accessed.
Here’s an example, let say I have CBC latch contention problems and from monitoring tools or sql_trace I have identified the hot latch, I use my sw.sql script here, to report what a session is waiting for:
SQL> @sw 138 SID STATE EVENT SEC_IN_WAIT P1 ------- ------- ---------------------------- ----------- ------------------ 138 WAITING latch: cache buffers chains 0 address=0x27E5A780
Sw reports that this session is waiting on cache buffer chains child latch with address 0×27E5a780.
Now we can run latchprofx and report sid, name, hold mode and protected object for that child latch only (instead of latch name we specified its address):
SQL> @latchprofx sid,name,hmode,object % 27E5A780 100000 SID NAME HMODE OBJECT Held Gets Held % Held ms ---- ---------------------- ------------ ------- ------ ----- ------- -------- 138 cache buffers chains exclusive 40EB02 3928 3797 3.93 41.637 151 cache buffers chains exclusive 40EB02 3711 3660 3.71 39.337 138 cache buffers chains shared 40EB02 623 623 .62 6.604 151 cache buffers chains shared 40EB02 544 544 .54 5.766
As you see, LatchProfX has reported a data block address in the object column. This is the datablock we are trying to access in buffer cache under protection of the given child latch.
Lets see what this DBA translates to:
SQL> @dba 40EB02 RFILE# BLOCK# ---------- ---------- 1 60162 Press enter to find the segment using V$BH (this may take CPU time), CTRL+C to cancel: STATE BLOCK_CLASS OBJECT_TYPE object TCH MODE_HELD ---------- ------------------ ------------------- ---------------------------------------- ---------- ---------- xcur data block INDEX SYS.SYS_IOT_TOP_94276 331 1 Press enter to query what segment resides there using DBA_EXTENTS (this can be IO intensive), CTRL+C to cancel: OWNER SEGMENT_NAME PARTITION_NAME TABLESPACE_NAME ------------------------------ -------------------- --------------- ----------------- SYS SYS_IOT_TOP_94276 SYSTEMSo, the troublemaker is an IOT index segment block belonging to table with object_id 94276. I can find the table name from dba_objects by object_id, or by using my script:
SQL> @oid 94276 owner object_name object_type CREATED ------------------------- ------------------------------ ------------------ ----------------- SYS KILL_CPU TABLE 20090825 23:19:49
Also, as we know the segment name and relative file number (RFILE#) and block number (BLOCK#) from above output we can dump the datablock with an ALTER SYSTEM DUMP DATAFILE command.
As that command requires absolute file id (not relative) we have to do a little conversion first (in cases where there are multiple files with same relative file number in database):
SQL> select file_id from dba_extents where relative_fno = 1 and segment_name = 'SYS_IOT_TOP_94276'; FILE_ID ---------- 1
Of course in my little test case all relative file numbers match with absolute numbers (but if you have lots of datafiles, have directly migrated from Oracle 7 or use transportable tablespaces, you may hit this issue).
Finally I can run ALTER SYSTEM DUMP DATAFILE 1 BLOCK 60162 to find out more about the contents of that block. Another option would be to construct a rowid from the data_object_id, rfile and block and query the table using that but as that’s complicated for IOTs, I’ll leave this to a future post.
You can download latest version of LatchProfX here:
Select COUNT(*) and COUNT(column) are different things!
Every now and then I see someone wondering why Oracle is “returning wrong results” for some count queries when counting using COUNT(column_name) instead of COUNT(*) or COUNT(<constant>).
Oracle is actually returning correct results, its just that sometimes the people asking the questions haven’t realized that COUNT(column) is something semantically different from COUNT(*).
COUNT(*) operation counts all rows fed to it by execution plan branch under it.
COUNT(column) operation on the other hand counts all non-null values in that column from rows fed to it by execution plan branch under it.
And here’s a little example:
SQL> select count(*) from v$session;
COUNT(*)
----------
23
SQL> select count(username) from v$session;
COUNT(USERNAME)
---------------
1 <<-- only one non-null value in that column
SQL> select count(nvl(username,'blah')) from v$session; <<-- lets replace NULLs with non-nulls
COUNT(NVL(USERNAME,'BLAH'))
---------------------------
23
My public appearances and final AOT seminar calendar for 2009
These are the public seminars I will do in 2009. I won’t add any more for this year as my calendar is almost full.
When I published this list last time I told there would be an addition for San Francisco Bay Area people. The event is added below (11-12 Nov).
9-11. September – Hotsos @ Dallas, TX – Postponed to later. I’ll keep you updated.
14-16. September – Hotsos @ Washington, DC – https://portal.hotsos.com/education/SEM/
5-6. October – Miracle @ London, UK – http://www.miraclebenelux.nl/tanel/
8-9. October – Miracle @ Helsinki, Finland – http://www.miracleoy.fi/
2-3. November – Miracle @ Utrecht, Netherlands – http://www.miraclebenelux.nl/tanel/
5-6. November – Miracle @ Brussels, Belgium – http://www.miraclebenelux.nl/tanel/
11-12. November – San Francisco Bay Area, CA – http://www.nocoug.org (more details to follow)
3-4. December – Miracle @ Birmingham, UK (right after UKOUG conference) – http://www.miraclebenelux.nl/tanel/
Conferences 200913. Nov NoCOUG Conference – topic TBD
2.Dec UKOUG Tech and EBS Conference
- Latch and Mutex Contention Troubleshooting in Oracle - 1 hour session
- Practical Oracle Capacity Planning – 2 hour masterclass
- Note that I will be doing my 2-day Advanced Oracle Troubleshooting seminar right after UKOUG on 3-4 Dec in Birmingham. This should be a convenient opportunity to fry your brains even more during the same trip.
6. Dec CMG Conference
- How To Do Capacity Planning in Oracle – half day workshop, with Dr Neil Gunther
I will also be around San Francisco when Oracle OpenWorld takes place, I’m not speaking at OOW, but will speak at another special event that same week – so stay tuned ;-)
Alter system kill session and ORA-00031: session marked for kill
I’m sure you are all familiar with this situation:
SQL> alter system kill session '152,33'; alter system kill session '152,33' * ERROR at line 1: ORA-00031: session marked for kill
The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.
So why is that?
The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!
So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.
All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.
Now, normally the target sessions are nice and check that bit often enough in their code, act on it and die.
But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits.
This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.
So, the “ORA-00031: session marked for kill” message you see after 60 seconds just means that:
1) Your session sets the “please die” bit in target sessions state object
2) Your session waits the target session to die for 60 seconds and times out after it doesn’t happen
3) Your session returns “session marked for kill” error – which means exactly what I wrote in step 1 above (”please die” bit is set)
By the way, if your session waits for the target session to die – what is the wait event you are waiting on then? Every wait, except few bugs, should be instrumented in Oracle.
This is easy to test, run this for example:
SQL> exec dbms_lock.sleep(100)
And immediately after try to kill that sleeping session from your session (in my case my session which issued the alter system kill command was 146). Lets see what it waits for:
SQL> @sw 146 SID STATE EVENT SEQ# SEC_IN_WAIT P1 ------- ------- ---------------------------------------- ---------- ----------- ------------------ 146 WAITING inactive session 140 3 152
See, the killer waits for inactive session event. The parameter 1 (P1) is 152. What does that mean?
V$EVENT_NAME (or V$SESSION_WAIT itself) gives you the answer. I will run my show event descriptions script to display that events parameter meaning:
SQL> @sed "inactive session"EVENT# EVENT_NAME PARAMETER1 PARAMETER2
------ ------------------------------------------------------- -------------------- ---------------
296 inactive session session# waitedSQL>
So, the killer waits for inactive session wait for 60 seconds and then times out. If the target session to be killed ever checks that “please die” bit, it will clean up and exit, otherwise it could be stuck forever!
This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here.
My preferred approach for killing sessions usually is:
1) of course – verify whether you’re killing the right session first
2) ALTER SYSTEM KILL SESSION
3) If that doesn’t work immediately then check whether the target session has acknowledged the kill and is rolling back its (large) transaction. You can do this by checking V$TRANSACTION.USED_UREC for that session’s transaction (if it has any) and see if its decreasing. If yes, the transaction is rolling back and we just need to wait it to finish. There’s also a bit in V$TRANSACTION which states whether the transaction is rolling back, but I don’t have my notes available right now (I’m on vacation actually :)
4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.
5) If couple of minutes after killing the process from OS level that sessions and its locks & resources are still not released (remember, we have verified that there was no long transaction rollback happening) then I would attach to my own process with oradebug and run “ORADEBUG WAKEUP 2″ couple of times (and checking if the session has gone + waiting few seconds between each invocation). The “2″ here means Oracle PID of PMON process which is usually 2, but you should check it from your V$PROCESS view.
If the session is still not gone, it’s time to open a support request.
Note that I didn’t post all the little details here – for example, before killing a problem process which is stuck it makes sense to gather some diagnostic data first (like run pstack on the problem process few times to get its current stack trace).
An interview with me
Iggy Fernandez posted an interview with me (published in NoCOUG journal) on his blog.
If you are interested in a little bit of my history and some more general (not-so-technical) thoughts of mine, check it out here:
http://iggyfernandez.wordpress.com/2009/08/08/great-expectations-an-interview-with-tanel-poder/


