Tanel Poder
V$CELL_THREAD_HISTORY – “ASH” for Exadata Storage Cells
Did you know that there’s something like Active Session History also in the Exadata storage cells? ;-)
The V$CELL_THREAD_HISTORY view is somewhat like V$ACTIVE_SESSION_HISTORY, but it’s measuring thread activity in the Exadata Storage Cells:
SQL> @desc v$cell_thread_history
Name Null? Type
------------------------------- -------- ----------------------------
1 CELL_NAME VARCHAR2(1024)
2 SNAPSHOT_ID NUMBER
3 SNAPSHOT_TIME DATE
4 THREAD_ID NUMBER
5 JOB_TYPE VARCHAR2(32)
6 WAIT_STATE VARCHAR2(32)
7 WAIT_OBJECT_NAME VARCHAR2(32)
8 SQL_ID VARCHAR2(13)
9 DATABASE_ID NUMBER
10 INSTANCE_ID NUMBER
11 SESSION_ID NUMBER
12 SESSION_SERIAL_NUM NUMBER
It keeps about 10 minutes worth of samples of Exadata Storage Cell thread activity:
SQL> @minmax snapshot_time v$cell_thread_history Show min/max (low/high) values in column "snapshot_time" of table v$cell_thread_history... MIN(SNAPSHOT_TIME MAX(SNAPSHOT_TIME ----------------- ----------------- 20130419 14:42:15 20130419 14:52:54
Note that it’s not the V$ view or the database instance which stores this array – it’s the storage cells themselves. If you query the v$cell_thread_history view, your Oracle database session is going to “gather” this instrumentation data from all the required cells and present it to you, that’s why the “cell statistics gather” wait event shows up:
PARSING IN CURSOR #140596385017248 len=42 dep=0 uid=0 oct=3 lid=0 tim=1366404896817011 hv=4063158547 ad='19e452578' sqlid='63awy1gg t2xs8m' select count(*) from v$cell_thread_history END OF STMT PARSE #140596385017248:c=0,e=115,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4272803188,tim=1366404896817010 EXEC #140596385017248:c=0,e=27,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4272803188,tim=1366404896817127 WAIT #140596385017248: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=40 tim=1366404896817152 WAIT #140596385017248: nam='cell statistics gather' ela= 283 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896818846 WAIT #140596385017248: nam='cell statistics gather' ela= 352 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896819317 WAIT #140596385017248: nam='cell statistics gather' ela= 376 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896820929 WAIT #140596385017248: nam='cell statistics gather' ela= 326 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896822198 WAIT #140596385017248: nam='cell statistics gather' ela= 580 cellhash#=0 p2=0 p3=0 obj#=40 tim=1366404896823620 ...
Now, how frequently do the cells sample their “ASH” data:
SQL> SELECT DISTINCT (snapshot_time - LAG(snapshot_time)
OVER (ORDER BY snapshot_time)) * 86400 lag_seconds
FROM (SELECT distinct snapshot_time
FROM v$cell_thread_history WHERE cell_name = '192.168.12.3');
LAG_SECONDS
-----------
1
Looks like the sampling is done exactly once per second!
So, great, what can we do with this data?
Here’s an Exadata performance script (cth.sql) which uses V$CELL_THREAD_HISTORY to dig into cell activity from the database layer.
First, I must warn you, the CTH (Cell Thread History) is not as powerful as the Database-level ASH (ASH has almost 100 columns in it already, but the v$cell_thread_history has only 12) and there seem to be some shortcomings in the instrumentation data, as you will see.
But it still allows us to drill down a bit deeper into Exadata storage cell activity straight from your favorite database access tool. And it also has database session-level details in it – which means this information can be used for systematic Exadata performance troubleshooting too!
As the database sessions do communicate the SQL_ID of the currently active SQL statement to the storage cells when sending them requests, we can list the TOP activity by SQL_ID inside the storage cells!
The cth.sql script takes 4 arguments:
- The columns to group the CTH activity data by
- The filter settings (which sessions, instances, sql_ids to report?)
- The begin time range to report
- The end time range to report
Note that the CTH view will only have about 10 minutes of history in it (unless the cells are almost idle). For longer-term analysis you might want to gather the history to some table every couple of minutes.
Also, the snapshot_time column shows the CELL time, so if there’s significant clock drift between the storage cells and DB nodes, you might measure the wrong range of data.
Anyway, let’s see the “TOP SQL” in all storage cells:
SQL> @exadata/cth sql_id 1=1 sysdate-1 sysdate Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ... SECONDS SQL_ID ---------- ------------- 1755 977 akbpwk3j38vw7 146 25y04n1twszyu 135 98pza6kpfwczj 14 5n83a4q202674 12 0w2qpuc6u2zsp 12 147a57cxq3w5y 7 axdkf5tjztmy7 6 apgb2g9q2zjh1 2 dcq9a12vtcnuw 2 a9gvfh5hx9u98 1 f318xdxdn0pdc 1 0ws7ahf1d78qa 13 rows selected.
Immediately we see that majority of the time consumed does not have any SQL ID associated with it, but we will get back to this later. For now, let’s pick the TOP SQL_ID and drill down into it further, by grouping the data by some more columns:
SQL> @exadata/cth sql_id,job_type,wait_state,wait_object_name sql_id='akbpwk3j38vw7' sysdate-1 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...
SECONDS SQL_ID JOB_TYPE WAIT_STATE WAIT_OBJECT_NAME
---------- ------------- -------------------------------- -------------------------------- --------------------------------
868 akbpwk3j38vw7 PredicateFilter working
52 akbpwk3j38vw7 PredicateDiskRead working
45 akbpwk3j38vw7 PredicateCacheGet waiting_for_SKGXP_send
7 akbpwk3j38vw7 CacheGet working
6 akbpwk3j38vw7 PredicateCacheGet working
3 akbpwk3j38vw7 CacheGet waiting_for_SKGXP_send
2 akbpwk3j38vw7 PredicateDiskRead waiting_for_mutex gdisk ext
2 akbpwk3j38vw7 CacheGet waiting_for_mutex RemoteSendPort
1 akbpwk3j38vw7 PredicateFilter waiting_for_mutex PredicateDisk-1
1 akbpwk3j38vw7 PredicateCachePut working
Now we have a breakdown of the storage cell thread activity, when (knowingly) doing work for the SQL_ID akbpwk3j38vw7. The job type shows the “phase” of execution in the data flow inside a storage cell. I think I need to write a separate post about the cell execution phases, but for now, the PredicateDiskRead, as the name says, deals with setting up the IOs for disk reads, the PredicateCacheGet is what deals with the (smart) IO requests handed to storage cells by DB sessions (and sending the results back too) and the PredicateFilter is the phase which does the actual processing of just read datablocks in storage cells (logically “opening” the block and passing it through cache, transactional layers to the data layer functions, which extract the required columns of the matching rows of any predicates offloaded to the storage cells). If you wonder where are the I/O wait states – it looks like for (at least) smart scans, thanks to asynchronous I/O the actual I/O waiting/reaping is done by separate threads, which do not “belong” to a single SQL_ID, but track the completion of all I/Os submitted by all threads independently. Cellsrv is a single (multithreaded) process, so it’s easy to separate tasks within the process. This is also how you avoid keeping all 100+ cellsrv threads “occupied” in case of slow IO.
The wait state column is more complicated than the database layer’s session wait state “waiting or not waiting” logic. There are no separate wait events listed in the CTH, but the wait state itself contains the reasons for some waits. Working means ON CPU (or trying to be on CPU if the OS scheduler hasn’t put the thread there yet) and all the waiting_ states, like the name says, mean waiting for something, like a next request over the network (waiting_for_SKGXP_receive) or asynchronous IO completion (waiting_for_IO_reap). Note that Exadata cellsrv architecture is radically different from what you’re used to from the Database world, it’s a multithreaded job queuing and scheduling engine where one thread may submit IOs and some completely different thread will be the one who checks for the asynchronous IO completion (IO reaping), while a yet another thread does the actual filter processing (offloaded execution) tasks. So don’t expect the cellsrv threads serving your smart scans to actually wait for any I/O, because there are independent threads doing I/O reaping for all incoming requests. If you want more deep details – I do talk about the cellsrv internals and data flow in my Advanced Exadata Performance seminar and also Roger MacNicol from Exadata Development will deliver an Exadata SmartScan Deep Dive presentation at Enkitec E4 in August this year (2013) ;-)
Here’s the (probably) full list of wait states from Oracle 11.2.0.3 BP11 running on Cellsrv 11.2.3.2.1)
initial working waiting_in_scheduler waiting_for_mutex waiting_for_condition_variable waiting_for_IO_reap waiting_for_SKGXP_receive waiting_for_SKGXP_send looking_for_job waiting_for_msg_hdr waiting_for_system_work waiting_for_offline_done waiting_for_storidx_lock waiting_for_issued_ios waiting_for_buffer_allocation waiting_for_connect waiting_for_flashcache_init waiting_for_io_quiescence waiting_for_io_draining
Here’s another example, where I show all activity in all storage cells, by SQL_ID (if known), the database_id, instance_id and the session_id of the database session:
SQL> @exadata/cth sql_id,database_id,session_id,job_type,wait_state 1=1 sysdate-1/24/60 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...
SECONDS SQL_ID DATABASE_ID SESSION_ID JOB_TYPE WAIT_STATE MIN(SNAPSHOT_TIME MAX(SNAPSHOT_TIME
---------- ------------- ----------- ---------- -------------------------------- -------------------------------- ----------------- -----------------
1301 0qmhpg330zqfu 1538629110 2916 PredicateFilter working 20130514 09:20:11 20130514 09:21:10
313 0qmhpg330zqfu 1538629110 5 PredicateFilter working 20130514 09:20:11 20130514 09:21:10
242 0qmhpg330zqfu 1538629110 203 PredicateFilter working 20130514 09:20:11 20130514 09:21:10
73 0 0 UnidentifiedJob waiting_for_IO_reap 20130514 09:20:11 20130514 09:21:10
9 0 0 UnidentifiedJob working 20130514 09:20:13 20130514 09:20:58
4 1538629110 972 CachePut working 20130514 09:20:17 20130514 09:21:04
3 0 0 NetworkRead working 20130514 09:20:17 20130514 09:20:34
2 1538629110 584 CachePut working 20130514 09:20:13 20130514 09:20:59
2 1538629110 492 CachePut working 20130514 09:20:26 20130514 09:20:52
2 0qmhpg330zqfu 1538629110 5 PredicateDiskRead working 20130514 09:20:58 20130514 09:20:58
1 0 0 IO poor performance detection th 20130514 09:20:23 20130514 09:20:23
1 0 0 ProcessIoctl working 20130514 09:20:26 20130514 09:20:26
1 147a57cxq3w5y 1538629110 2140 CacheGet working 20130514 09:20:12 20130514 09:20:12
1 0w2qpuc6u2zsp 1538629110 1479 CacheGet working 20130514 09:21:04 20130514 09:21:04
1 0w2qpuc6u2zsp 1538629110 986 CacheGet working 20130514 09:20:44 20130514 09:20:44
1 0 0 System stats collection thread working 20130514 09:20:41 20130514 09:20:41
1 147a57cxq3w5y 1538629110 790 CacheGet waiting_for_SKGXP_send 20130514 09:20:29 20130514 09:20:29
1 apgb2g9q2zjh1 1538629110 1280 CacheGet working 20130514 09:21:01 20130514 09:21:01
1 apgb2g9q2zjh1 1538629110 1085 CacheGet working 20130514 09:21:02 20130514 09:21:02
1 4075070345 1177 CacheGet working 20130514 09:20:23 20130514 09:20:23
Notes:
- Exadata storage cell software (cellsrv) samples active thread states every second (like ASH), so every record in V$CELL_THREAD_HISTORY stands for 1 second of “cell activity time” if counted. You can COUNT the records in V$CELL_THREAD_HISTORY just like you can count them in ASH to get approximate (cell) activity time.
- You do not have to query the GV$ view as the V$ view itself returns all data from all storage cells (unless restricted by a where clause)
- There is no 1-1 correspondence of a cell thread and your database session, so don’t try to simply add these numbers up. A smart scan can asynchronously and independently work for your database session, working in advance, doing prefetching and preprocessing regardless of what your database session happens to be doing. One smart scan “session” can end up spawning multiple threads in each storage cell if the cell can not produce data fast enough than the database session can consume.
- The database_id column corresponds to the “global database ID” ( SELECT ksqdngunid FROM X$KSQDN ), not the usual DBID in v$database.
A warning: There have been a few cases where my session crashes (well, actually my process, but not the database) when querying this V$ view:
SQL> @exadata/cth wait_state,sql_id 1=1 sysdate-1/24 sysdate
Querying V$CELL_THREAD_HISTORY ("ASH" for Storage Cells) ...
, MIN(snapshot_time), MAX(snapshot_time)
*
ERROR at line 5:
ORA-03113: end-of-file on communication channel
But it only has affected my own session so far, so when I log in and rerun the script, it works ok again. I yet have to file a bug for this – this is a documented V$ view and it shouldn’t crash when queried.
This post ended up quite long, so I will continue a Part 2 some day in the future :)
You can download the latest version of cth.sql from here.
Oracle Database 11.2 Upgrade and Migration slides
Check out the extensive slide deck (over 500 slides) about upgrading techniques to Oracle 11.2, by Oracle Corp (Roy Swonger and Mike Dietrich):
It has lots of examples (from real customer upgrade cases) in it.
Thanks to Randolf Geist for telling me about this.
You can also download other interesting presentations from that page (check the right hand side download section), like the Oracle Database Behavior Changes 8.0 through 11.2 doc. Pretty cool!
How to Compose New Gmail Messages in Full Screen (instead of the tiny compose box of new Gmail)
I’m writing this (unusual) post as I am a long time Gmail user and recently I’ve seen plenty of people & articles complain about the Gmail’s new compose window (the one that shows up as a small hovering window in the bottom right of your screen):
The top google hits so far only return tips to disable the new editor completely, but I want to use the new one, just in a bigger window! There is a very easy workaround for that – and there’s no need to switch back to the old compose mode at all!
If you are using your mouse, then just:
- Hold down SHIFT key when clicking the Compose button to get a stand-alone new window for composing a message.
- Hold down CTRL (on Windows) or Command (on Mac OSX) key to get a full-screen new browser tab for composing a message.
Examples 1 & 2 below:
If you use Gmail keyboard shortcuts for productivity (they’re awesome!), then you can just:
- Press “c” for the new (small) compose box
- Press SHIFT-C (capital “C”) for the stand-alone new window for composing a message
- Press D for a full screen new browser tab for composing a message
Note that you can view the Gmail shortcuts reference, if this feature is enabled, just by pressing the ? (question mark) key:

So, there’s no need to switch back to the old compose mode completely, just remember SHIFT, CTRL and the “D” keys :)
Public Appearances and Exadata Performance Training
I will be doing a lot of (Exadata) talking and teaching in the coming months. Here’s a list of events where you’ll see me speaking, teaching, hacking, learning and hopefully also drinking beer:
- 16 April 2013 – UKOUG Engineered Systems Summit
- London, UK
- I will talk about the common Exadata performance lessons learned in my “4 years of Exadata performance” talk
- Andy Colvin and Martin Bach of Enkitec will also speak there (yes, Martin has joined Enkitec in Europe!!! :-)
- 30 April 2013 – NYOUG Training Day
- Manhattan, NYC
- This is the first ever public delivery of my new Getting the Most Out of Oracle’s Active Session History, Time Model and Wait events seminar!
- 2-3 May 2013 – Advanced Exadata Performance seminar – in-classroom!
- First public delivery of this class – and we’ll be going very deep into Exadata internals and performance ;-)
- Irving, TX (in Enkitec HQ)
- 13-16 May 2013 – Advanced Exadata Performance seminar – online
- Same as previous, but online.
- 10-11 June 2013 – My new ASH seminar – online
- Getting the Most Out of Oracle’s Active Session History, Time Model and Wait events
- More details will appear in the training & seminar page soon
- 13-14 June 2013 – Oracle Data Warehouse Global Leaders forum (by invitation-only event run by Oracle)
- Amsterdam, Netherlands
- I will speak about my “4 years of Exadata performance” experience and probably learn from others’ experience too
- 5-6 Aug 2013 – Enkitec Extreme Exadata Expo (E4) conference
- Irving, TX
- I will be speaking there, haven’t set the exact topic yet, but it will include demos and hacking something I suspect ;-)
- 22-26 September 2013 – Oracle OpenWorld conference
- San Francisco, CA
- I will attend the ACE Directors briefing before the conference, likely speak, hang out at Enkitec booth, probably hack something at Enkitec’s Conference HQ for fun and definitely drink beer there. Maybe I’ll even attend the Wednesday’s party for a change!
As you see, my great plans to not travel much are not going to work out well :)
Actually it is better, this year I’ve managed to only travel twice so far (and one of the trips was for vacation!) and I haven’t had to do too many of the crazy around-the-world in 5 days trips I did when living in back Singapore… but looks like I’ll get to 2 million km nevertheless this year:
Well, see you in some corner of the world some day!
Understanding what a hint affects using the V$SQL_FEATURE views
You may have used the Oracle 11g V$SQL_HINT view already – it displays all the valid hints (both documented and undocumented ones) available in your Oracle version, for example:
SQL> @hint merge NAME VERSION VERSION_OUTLINE INVERSE ---------------------------------------------------------------- ------------------------- ------------------------- ---------------------------------------------------------------- MERGE_CONST_ON 8.0.0 MERGE_AJ 8.1.0 8.1.7 MERGE_SJ 8.1.0 8.1.7 MV_MERGE 9.0.0 MERGE 8.1.0 10.1.0 NO_MERGE NO_MERGE 8.0.0 10.1.0 MERGE USE_MERGE_CARTESIAN 11.1.0.6 11.1.0.6 USE_MERGE 8.1.0 8.1.7 NO_USE_MERGE NO_USE_MERGE 10.1.0.3 10.1.0.3 USE_MERGE
But there’s more, (semi)undocumented views like V$SQL_FEATURE and V$SQL_FEATURE_HIERARCHY do give us more information about what these hints relate to. For example, if you have ever wondered why is there a MERGE hint and then also a USE_MERGE hint, you can check what do these hints control using my hinth.sql (Hint Hierarchy) script:
SQL> @hinth MERGE Display Hint feature hierarchy for hints like MERGE NAME PATH ---------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ MERGE ALL -> COMPILATION -> CBO -> CBQT -> CVM MERGE ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> CVM MERGE ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> CVM
So, the MERGE hints seem to affect the CBO’s query transformation code – (CBQT means Cost-Based Query Transformation and CVM means Complex View Merging, but more about that later).
SQL> @hinth USE_MERGE Display Hint feature hierarchy for hints like USE_MERGE NAME PATH ---------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ USE_MERGE ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_MERGE
And the USE_MERGE hint is about controlling the use of a join method – the sort-merge join.
Let’s list all hints having NL in them:
SQL> @hinth %NL% Display Hint feature hierarchy for hints like %NL% NAME PATH ---------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ INLINE_XMLTYPE_NT ALL NL_SJ ALL -> COMPILATION -> CBO NL_AJ ALL -> COMPILATION -> CBO NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_USE_NL ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL USE_NL ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL USE_NL_WITH_INDEX ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL_WITH_INDEX NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> STAR_TRANS -> TABLE_LOOKUP_BY_NL TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_CONNECT_BY_CB_WHR_ONLY ALL -> COMPILATION -> TRANSFORMATION CONNECT_BY_CB_WHR_ONLY ALL -> COMPILATION -> TRANSFORMATION INLINE ALL -> COMPILATION -> TRANSFORMATION NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_NLJ_BATCHING ALL -> EXECUTION NLJ_BATCHING ALL -> EXECUTION NO_NLJ_PREFETCH ALL -> EXECUTION NLJ_PREFETCH ALL -> EXECUTION
Plenty of interesting stuff here – the new hint TABLE_LOOKUP_BY_NL that has showed up recently seems to have to do with star transformations for example (I just learned this myself from this output).
Interestingly the NLJ_BATCHING and NLJ_PREFETCH hints are considered as execution phase hints apparently (that was my term, I’m thinking about hints (also) affecting a decision in the execution phase, not just during optimization). For example, normally the NLJ prefetch feature can be dynamically turned on & off during the query execution, I guess with a hint this feature would be always enabled (I’m not sure about this here, just trying to reason why a hint is shown to be related to “execution” phase).
If optimizer feature terms like CBQT and CVM do not immediately ring a bell, you can use the V$SQL_FEATURE view (or my sqlfh.sql script) to list some more info about what these SQL feature name abbreviations mean and where in the hierarchy does this particular feature stand.
The script below doesn’t accept any parameters, prints out the entire SQL feature hierarchy (except the temporary bugfix features you can see from V$SYSTEM_FIX_CONTROL):
SQL> @sqlfh SQL_FEATURE DESCRIPTION ------------------------------------------------------- ---------------------------------------------------------------- ALL A Universal Feature COMPILATION SQL COMPILATION CBO SQL Cost Based Optimization ACCESS_PATH Query access path AND_EQUAL Index and-equal access path BITMAP_TREE Bitmap tree access path FULL Full table scan INDEX Index INDEX_ASC Index (ascending) INDEX_COMBINE Combine index for bitmap access INDEX_DESC Use index (descending) INDEX_FFS Index fast full scan INDEX_JOIN Index join INDEX_RS_ASC Index range scan INDEX_RS_DESC Index range scan descending INDEX_SS Index skip scan INDEX_SS_ASC Index skip scan ascending INDEX_SS_DESC Index skip scan descending SORT_ELIM Sort Elimination Via Index CBQT Cost Based Query Transformation CVM Complex View Merging DIST_PLCMT Distinct Placement JOINFAC Join Factorization JPPD Join Predicate Push Down PLACE_GROUP_BY Group-By Placement PULL_PRED pull predicates STAR_TRANS Star Transformation TABLE_LOOKUP_BY_NL Table Lookup By Nested Loop TABLE_EXPANSION Table Expansion UNNEST unnest query block CURSOR_SHARING Cursor sharing DML DML JOIN_METHOD Join methods USE_HASH Hash join USE_MERGE Sort-merge join USE_MERGE_CARTESIAN Merge join cartesian USE_NL Nested-loop join USE_NL_WITH_INDEX Nested-loop index join JOIN_ORDER Join order OPT_MODE Optimizer mode ALL_ROWS All rows (optimizer mode) CHOOSE Choose (optimizer mode) FIRST_ROWS First rows (optimizer mode) OR_EXPAND OR expansion OUTLINE Outlines PARTITION Partition PQ Parallel Query PARALLEL Parallel table PQ_DISTRIBUTE PQ Distribution method PQ_MAP PQ slave mapper PX_JOIN_FILTER Bloom filtering for joins STAR_TRANS Star Transformation TABLE_LOOKUP_BY_NL Table Lookup By Nested Loop STATS Optimizer statistics CARDINALITY Cardinality computation COLUMN_STATS Basic column statistics CPU_COSTING CPU costing DBMS_STATS Statistics gathered by DBMS_STATS DYNAMIC_SAMPLING Dynamic sampling DYNAMIC_SAMPLING_EST_CDN Estimate CDN using dynamic sampling GATHER_PLAN_STATISTICS Gather plan statistics INDEX_STATS Basic index statistics OPT_ESTIMATE Optimizer estimates TABLE_STATS Basic table statistics QUERY_REWRITE query rewrite with materialized views RBO SQL Rule Based Optimization SQL_CODE_GENERATOR SQL Code Generator SQL_PLAN_MANAGEMENT SQL Plan Management TRANSFORMATION Query Transformation CBQT Cost Based Query Transformation CVM Complex View Merging DIST_PLCMT Distinct Placement JOINFAC Join Factorization JPPD Join Predicate Push Down PLACE_GROUP_BY Group-By Placement PULL_PRED pull predicates STAR_TRANS Star Transformation TABLE_LOOKUP_BY_NL Table Lookup By Nested Loop TABLE_EXPANSION Table Expansion UNNEST unnest query block HEURISTIC Heuristic Query Transformation CNT Count(col) to count(*) COALESCE_SQ coalesce subqueries CSE Common Sub-Expression Elimination CVM Complex View Merging FILTER_PUSH_PRED Push filter predicates FULL_OUTER_JOIN_TO_OUTER Join Conversion JPPD Join Predicate Push Down OBYE Order-by Elimination OLD_PUSH_PRED Old push predicate algorithm (pre-10.1.0.3) OUTER_JOIN_TO_ANTI Join Conversion OUTER_JOIN_TO_INNER Join Conversion PRED_MOVE_AROUND Predicate move around SET_TO_JOIN Transform set operations to joins SVM Simple View Merging TABLE_ELIM Table Elimination UNNEST unnest query block USE_CONCAT Or-optimization XML_REWRITE XML Rewrite CHECK_ACL_REWRITE Check ACL Rewrite COST_XML_QUERY_REWRITE Cost Based XML Query Rewrite XMLINDEX_REWRITE XMLIndex Rewrite EXECUTION SQL EXECUTION
I highlighted the CVM and CBQT lines above…
Just for reference (and if you’re too lazy to run these scripts yourself), I’ve pasted the full output of the hint feature hierarchy script too (executed in my 11.2.0.3 DB):
SQL> @hinth % Display Hint feature hierarchy for hints like % NAME PATH ---------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NO_XDB_FASTPATH_INSERT ALL XDB_FASTPATH_INSERT ALL NO_USE_HASH_GBY_FOR_PUSHDOWN ALL USE_HASH_GBY_FOR_PUSHDOWN ALL XMLINDEX_SEL_IDX_TBL ALL NO_DST_UPGRADE_INSERT_CONV ALL DST_UPGRADE_INSERT_CONV ALL NO_CONNECT_BY_ELIM_DUPS ALL CONNECT_BY_ELIM_DUPS ALL NO_MONITOR ALL MONITOR ALL NO_NATIVE_FULL_OUTER_JOIN ALL NATIVE_FULL_OUTER_JOIN ALL NO_CONNECT_BY_COMBINE_SW ALL CONNECT_BY_COMBINE_SW ALL OPT_PARAM ALL OUTLINE_LEAF ALL OUTLINE ALL NO_CARTESIAN ALL INCLUDE_VERSION ALL RESTRICT_ALL_REF_CONS ALL NO_ACCESS ALL HASH ALL DRIVING_SITE ALL CACHE_TEMP_TABLE ALL QB_NAME ALL NO_STATS_GSETS ALL NO_USE_HASH_AGGREGATION ALL USE_HASH_AGGREGATION ALL NO_MODEL_PUSH_REF ALL MODEL_NO_ANALYSIS ALL SCN_ASCENDING ALL TIV_GB ALL PIV_GB ALL TIV_SSF ALL PIV_SSF ALL NO_CONNECT_BY_FILTERING ALL CONNECT_BY_FILTERING ALL BYPASS_RECURSIVE_CHECK ALL SYS_RID_ORDER ALL NO_BASETABLE_MULTIMV_REWRITE ALL NO_MULTIMV_REWRITE ALL REMOTE_MAPPED ALL NO_GBY_PUSHDOWN ALL GBY_PUSHDOWN ALL IGNORE_OPTIM_EMBEDDED_HINTS ALL DB_VERSION ALL OPTIMIZER_FEATURES_ENABLE ALL USE_WEAK_NAME_RESL ALL IGNORE_WHERE_CLAUSE ALL INLINE_XMLTYPE_NT ALL NESTED_TABLE_FAST_INSERT ALL NESTED_TABLE_SET_SETID ALL PRESERVE_OID ALL NESTED_TABLE_GET_REFS ALL DEREF_NO_REWRITE ALL NO_SQL_TUNE ALL NO_MONITORING ALL NO_OUTER_JOIN_TO_ANTI ALL -> COMPILATION -> CBO OUTER_JOIN_TO_ANTI ALL -> COMPILATION -> CBO NO_FULL_OUTER_JOIN_TO_OUTER ALL -> COMPILATION -> CBO FULL_OUTER_JOIN_TO_OUTER ALL -> COMPILATION -> CBO APPEND_VALUES ALL -> COMPILATION -> CBO NUM_INDEX_KEYS ALL -> COMPILATION -> CBO NO_DOMAIN_INDEX_FILTER ALL -> COMPILATION -> CBO DOMAIN_INDEX_FILTER ALL -> COMPILATION -> CBO NO_PARTIAL_COMMIT ALL -> COMPILATION -> CBO SKIP_UNQ_UNUSABLE_IDX ALL -> COMPILATION -> CBO X_DYN_PRUNE ALL -> COMPILATION -> CBO ROWID ALL -> COMPILATION -> CBO CLUSTER ALL -> COMPILATION -> CBO NO_SWAP_JOIN_INPUTS ALL -> COMPILATION -> CBO SWAP_JOIN_INPUTS ALL -> COMPILATION -> CBO INDEX_RRS ALL -> COMPILATION -> CBO NO_SUBQUERY_PRUNING ALL -> COMPILATION -> CBO SUBQUERY_PRUNING ALL -> COMPILATION -> CBO USE_SEMI ALL -> COMPILATION -> CBO USE_ANTI ALL -> COMPILATION -> CBO QUEUE_ROWP ALL -> COMPILATION -> CBO QUEUE_CURR ALL -> COMPILATION -> CBO CACHE_CB ALL -> COMPILATION -> CBO NO_PARALLEL ALL -> COMPILATION -> CBO CURSOR_SHARING_EXACT ALL -> COMPILATION -> CBO NO_BUFFER ALL -> COMPILATION -> CBO BUFFER ALL -> COMPILATION -> CBO NO_QKN_BUFF ALL -> COMPILATION -> CBO BITMAP ALL -> COMPILATION -> CBO RESTORE_AS_INTERVALS ALL -> COMPILATION -> CBO SAVE_AS_INTERVALS ALL -> COMPILATION -> CBO CUBE_GB ALL -> COMPILATION -> CBO SYS_PARALLEL_TXN ALL -> COMPILATION -> CBO OVERFLOW_NOMOVE ALL -> COMPILATION -> CBO HWM_BROKERED ALL -> COMPILATION -> CBO LOCAL_INDEXES ALL -> COMPILATION -> CBO BYPASS_UJVC ALL -> COMPILATION -> CBO NL_SJ ALL -> COMPILATION -> CBO HASH_SJ ALL -> COMPILATION -> CBO MERGE_SJ ALL -> COMPILATION -> CBO NL_AJ ALL -> COMPILATION -> CBO HASH_AJ ALL -> COMPILATION -> CBO MERGE_AJ ALL -> COMPILATION -> CBO SEMIJOIN_DRIVER ALL -> COMPILATION -> CBO SKIP_EXT_OPTIMIZER ALL -> COMPILATION -> CBO DOMAIN_INDEX_NO_SORT ALL -> COMPILATION -> CBO DOMAIN_INDEX_SORT ALL -> COMPILATION -> CBO ORDERED_PREDICATES ALL -> COMPILATION -> CBO ORDERED ALL -> COMPILATION -> CBO FBTSCAN ALL -> COMPILATION -> CBO MERGE_CONST_ON ALL -> COMPILATION -> CBO STREAMS ALL -> COMPILATION -> CBO EXPR_CORR_CHECK ALL -> COMPILATION -> CBO VECTOR_READ_TRACE ALL -> COMPILATION -> CBO VECTOR_READ ALL -> COMPILATION -> CBO DML_UPDATE ALL -> COMPILATION -> CBO SQLLDR ALL -> COMPILATION -> CBO SYS_DL_CURSOR ALL -> COMPILATION -> CBO NO_REF_CASCADE ALL -> COMPILATION -> CBO REF_CASCADE_CURSOR ALL -> COMPILATION -> CBO NOAPPEND ALL -> COMPILATION -> CBO APPEND ALL -> COMPILATION -> CBO AND_EQUAL ALL -> COMPILATION -> CBO -> ACCESS_PATH -> AND_EQUAL BITMAP_TREE ALL -> COMPILATION -> CBO -> ACCESS_PATH -> BITMAP_TREE FULL ALL -> COMPILATION -> CBO -> ACCESS_PATH -> FULL NO_USE_INVISIBLE_INDEXES ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX USE_INVISIBLE_INDEXES ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX NO_INDEX ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX INDEX ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX INDEX_ASC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_ASC INDEX_COMBINE ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_COMBINE INDEX_DESC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_DESC NO_INDEX_FFS ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_FFS INDEX_FFS ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_FFS INDEX_JOIN ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_JOIN INDEX_RS_ASC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_RS_ASC INDEX_RS_DESC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_RS_DESC NO_INDEX_SS ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_SS INDEX_SS ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_SS INDEX_SS_ASC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_SS_ASC INDEX_SS_DESC ALL -> COMPILATION -> CBO -> ACCESS_PATH -> INDEX_SS_DESC NO_MERGE ALL -> COMPILATION -> CBO -> CBQT -> CVM MERGE ALL -> COMPILATION -> CBO -> CBQT -> CVM NO_PLACE_DISTINCT ALL -> COMPILATION -> CBO -> CBQT -> DIST_PLCMT PLACE_DISTINCT ALL -> COMPILATION -> CBO -> CBQT -> DIST_PLCMT NO_FACTORIZE_JOIN ALL -> COMPILATION -> CBO -> CBQT -> JOINFAC FACTORIZE_JOIN ALL -> COMPILATION -> CBO -> CBQT -> JOINFAC NO_PLACE_GROUP_BY ALL -> COMPILATION -> CBO -> CBQT -> PLACE_GROUP_BY PLACE_GROUP_BY ALL -> COMPILATION -> CBO -> CBQT -> PLACE_GROUP_BY NO_PULL_PRED ALL -> COMPILATION -> CBO -> CBQT -> PULL_PRED PULL_PRED ALL -> COMPILATION -> CBO -> CBQT -> PULL_PRED NO_FACT ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS FACT ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS NO_STAR_TRANSFORMATION ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS STAR_TRANSFORMATION ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS STAR ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_EXPAND_TABLE ALL -> COMPILATION -> CBO -> CBQT -> TABLE_EXPANSION EXPAND_TABLE ALL -> COMPILATION -> CBO -> CBQT -> TABLE_EXPANSION NO_UNNEST ALL -> COMPILATION -> CBO -> CBQT -> UNNEST UNNEST ALL -> COMPILATION -> CBO -> CBQT -> UNNEST NO_BIND_AWARE ALL -> COMPILATION -> CBO -> CURSOR_SHARING BIND_AWARE ALL -> COMPILATION -> CBO -> CURSOR_SHARING RETRY_ON_ROW_CHANGE ALL -> COMPILATION -> CBO -> DML CHANGE_DUPKEY_ERROR_INDEX ALL -> COMPILATION -> CBO -> DML IGNORE_ROW_ON_DUPKEY_INDEX ALL -> COMPILATION -> CBO -> DML NO_USE_HASH ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_HASH USE_HASH ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_HASH NO_USE_MERGE ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_MERGE USE_MERGE ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_MERGE USE_MERGE_CARTESIAN ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_MERGE_CARTESIAN NO_USE_NL ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL USE_NL ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL USE_NL_WITH_INDEX ALL -> COMPILATION -> CBO -> JOIN_METHOD -> USE_NL_WITH_INDEX LEADING ALL -> COMPILATION -> CBO -> JOIN_ORDER ALL_ROWS ALL -> COMPILATION -> CBO -> OPT_MODE -> ALL_ROWS CHOOSE ALL -> COMPILATION -> CBO -> OPT_MODE -> CHOOSE FIRST_ROWS ALL -> COMPILATION -> CBO -> OPT_MODE -> FIRST_ROWS OR_EXPAND ALL -> COMPILATION -> CBO -> OR_EXPAND NO_PARALLEL_INDEX ALL -> COMPILATION -> CBO -> PQ PARALLEL_INDEX ALL -> COMPILATION -> CBO -> PQ NO_STATEMENT_QUEUING ALL -> COMPILATION -> CBO -> PQ -> PARALLEL STATEMENT_QUEUING ALL -> COMPILATION -> CBO -> PQ -> PARALLEL SHARED ALL -> COMPILATION -> CBO -> PQ -> PARALLEL NOPARALLEL ALL -> COMPILATION -> CBO -> PQ -> PARALLEL PQ_DISTRIBUTE ALL -> COMPILATION -> CBO -> PQ -> PQ_DISTRIBUTE PQ_NOMAP ALL -> COMPILATION -> CBO -> PQ -> PQ_MAP PQ_MAP ALL -> COMPILATION -> CBO -> PQ -> PQ_MAP NO_PX_JOIN_FILTER ALL -> COMPILATION -> CBO -> PQ -> PX_JOIN_FILTER PX_JOIN_FILTER ALL -> COMPILATION -> CBO -> PQ -> PX_JOIN_FILTER NO_FACT ALL -> COMPILATION -> CBO -> STAR_TRANS STAR ALL -> COMPILATION -> CBO -> STAR_TRANS FACT ALL -> COMPILATION -> CBO -> STAR_TRANS NO_STAR_TRANSFORMATION ALL -> COMPILATION -> CBO -> STAR_TRANS STAR_TRANSFORMATION ALL -> COMPILATION -> CBO -> STAR_TRANS NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> STAR_TRANS -> TABLE_LOOKUP_BY_NL TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> CBO -> STAR_TRANS -> TABLE_LOOKUP_BY_NL CARDINALITY ALL -> COMPILATION -> CBO -> STATS TABLE_STATS ALL -> COMPILATION -> CBO -> STATS INDEX_STATS ALL -> COMPILATION -> CBO -> STATS COLUMN_STATS ALL -> COMPILATION -> CBO -> STATS NO_CPU_COSTING ALL -> COMPILATION -> CBO -> STATS -> CPU_COSTING CPU_COSTING ALL -> COMPILATION -> CBO -> STATS -> CPU_COSTING DBMS_STATS ALL -> COMPILATION -> CBO -> STATS -> DBMS_STATS DYNAMIC_SAMPLING ALL -> COMPILATION -> CBO -> STATS -> DYNAMIC_SAMPLING DYNAMIC_SAMPLING_EST_CDN ALL -> COMPILATION -> CBO -> STATS -> DYNAMIC_SAMPLING_EST_CDN GATHER_PLAN_STATISTICS ALL -> COMPILATION -> CBO -> STATS -> GATHER_PLAN_STATISTICS OPT_ESTIMATE ALL -> COMPILATION -> CBO -> STATS -> OPT_ESTIMATE RBO_OUTLINE ALL -> COMPILATION -> RBO RULE ALL -> COMPILATION -> RBO NO_PRUNE_GSETS ALL -> COMPILATION -> TRANSFORMATION MODEL_DONTVERIFY_UNIQUENESS ALL -> COMPILATION -> TRANSFORMATION MODEL_PUSH_REF ALL -> COMPILATION -> TRANSFORMATION MODEL_COMPILE_SUBQUERY ALL -> COMPILATION -> TRANSFORMATION MODEL_DYNAMIC_SUBQUERY ALL -> COMPILATION -> TRANSFORMATION MODEL_MIN_ANALYSIS ALL -> COMPILATION -> TRANSFORMATION NO_EXPAND_GSET_TO_UNION ALL -> COMPILATION -> TRANSFORMATION EXPAND_GSET_TO_UNION ALL -> COMPILATION -> TRANSFORMATION MV_MERGE ALL -> COMPILATION -> TRANSFORMATION NO_CONNECT_BY_COST_BASED ALL -> COMPILATION -> TRANSFORMATION CONNECT_BY_COST_BASED ALL -> COMPILATION -> TRANSFORMATION INLINE ALL -> COMPILATION -> TRANSFORMATION MATERIALIZE ALL -> COMPILATION -> TRANSFORMATION REWRITE_OR_ERROR ALL -> COMPILATION -> TRANSFORMATION NO_REWRITE ALL -> COMPILATION -> TRANSFORMATION REWRITE ALL -> COMPILATION -> TRANSFORMATION NO_SEMIJOIN ALL -> COMPILATION -> TRANSFORMATION SEMIJOIN ALL -> COMPILATION -> TRANSFORMATION ANTIJOIN ALL -> COMPILATION -> TRANSFORMATION NO_PUSH_SUBQ ALL -> COMPILATION -> TRANSFORMATION PUSH_SUBQ ALL -> COMPILATION -> TRANSFORMATION NO_QUERY_TRANSFORMATION ALL -> COMPILATION -> TRANSFORMATION OPAQUE_XCANONICAL ALL -> COMPILATION -> TRANSFORMATION OPAQUE_TRANSFORM ALL -> COMPILATION -> TRANSFORMATION NO_CONNECT_BY_CB_WHR_ONLY ALL -> COMPILATION -> TRANSFORMATION CONNECT_BY_CB_WHR_ONLY ALL -> COMPILATION -> TRANSFORMATION NO_TRANSFORM_DISTINCT_AGG ALL -> COMPILATION -> TRANSFORMATION TRANSFORM_DISTINCT_AGG ALL -> COMPILATION -> TRANSFORMATION PRECOMPUTE_SUBQUERY ALL -> COMPILATION -> TRANSFORMATION LIKE_EXPAND ALL -> COMPILATION -> TRANSFORMATION NO_ORDER_ROLLUPS ALL -> COMPILATION -> TRANSFORMATION GBY_CONC_ROLLUP ALL -> COMPILATION -> TRANSFORMATION USE_TTT_FOR_GSETS ALL -> COMPILATION -> TRANSFORMATION MERGE ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> CVM NO_MERGE ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> CVM NO_PLACE_DISTINCT ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> DIST_PLCMT PLACE_DISTINCT ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> DIST_PLCMT FACTORIZE_JOIN ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> JOINFAC NO_FACTORIZE_JOIN ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> JOINFAC PLACE_GROUP_BY ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> PLACE_GROUP_BY NO_PLACE_GROUP_BY ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> PLACE_GROUP_BY PULL_PRED ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> PULL_PRED NO_PULL_PRED ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> PULL_PRED NO_FACT ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS NO_STAR_TRANSFORMATION ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS STAR_TRANSFORMATION ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS STAR ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS FACT ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_TABLE_LOOKUP_BY_NL ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> STAR_TRANS -> TABLE_LOOKUP_BY_NL NO_EXPAND_TABLE ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> TABLE_EXPANSION EXPAND_TABLE ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> TABLE_EXPANSION NO_UNNEST ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> UNNEST UNNEST ALL -> COMPILATION -> TRANSFORMATION -> CBQT -> UNNEST NO_COALESCE_SQ ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> COALESCE_SQ COALESCE_SQ ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> COALESCE_SQ MERGE ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> CVM NO_MERGE ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> CVM NO_PUSH_PRED ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> FILTER_PUSH_PRED PUSH_PRED ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> FILTER_PUSH_PRED ELIMINATE_OBY ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> OBYE NO_ELIMINATE_OBY ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> OBYE OLD_PUSH_PRED ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> OLD_PUSH_PRED NO_OUTER_JOIN_TO_INNER ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> OUTER_JOIN_TO_INNER OUTER_JOIN_TO_INNER ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> OUTER_JOIN_TO_INNER SET_TO_JOIN ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> SET_TO_JOIN NO_SET_TO_JOIN ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> SET_TO_JOIN ELIMINATE_JOIN ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> TABLE_ELIM NO_ELIMINATE_JOIN ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> TABLE_ELIM NO_UNNEST ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> UNNEST UNNEST ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> UNNEST NO_EXPAND ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> USE_CONCAT USE_CONCAT ALL -> COMPILATION -> TRANSFORMATION -> HEURISTIC -> USE_CONCAT XML_DML_RWT_STMT ALL -> COMPILATION -> XML_REWRITE NO_XML_DML_REWRITE ALL -> COMPILATION -> XML_REWRITE NO_XML_QUERY_REWRITE ALL -> COMPILATION -> XML_REWRITE FORCE_XML_QUERY_REWRITE ALL -> COMPILATION -> XML_REWRITE CHECK_ACL_REWRITE ALL -> COMPILATION -> XML_REWRITE -> CHECK_ACL_REWRITE NO_CHECK_ACL_REWRITE ALL -> COMPILATION -> XML_REWRITE -> CHECK_ACL_REWRITE NO_COST_XML_QUERY_REWRITE ALL -> COMPILATION -> XML_REWRITE -> COST_XML_QUERY_REWRITE COST_XML_QUERY_REWRITE ALL -> COMPILATION -> XML_REWRITE -> COST_XML_QUERY_REWRITE NO_XMLINDEX_REWRITE_IN_SELECT ALL -> COMPILATION -> XML_REWRITE -> XMLINDEX_REWRITE XMLINDEX_REWRITE_IN_SELECT ALL -> COMPILATION -> XML_REWRITE -> XMLINDEX_REWRITE NO_XMLINDEX_REWRITE ALL -> COMPILATION -> XML_REWRITE -> XMLINDEX_REWRITE XMLINDEX_REWRITE ALL -> COMPILATION -> XML_REWRITE -> XMLINDEX_REWRITE NO_SUBSTRB_PAD ALL -> EXECUTION NO_NLJ_BATCHING ALL -> EXECUTION NLJ_BATCHING ALL -> EXECUTION NO_NLJ_PREFETCH ALL -> EXECUTION NLJ_PREFETCH ALL -> EXECUTION CACHE ALL -> EXECUTION NO_RESULT_CACHE ALL -> EXECUTION RESULT_CACHE ALL -> EXECUTION TRACING ALL -> EXECUTION NOCACHE ALL -> EXECUTION NO_LOAD ALL -> EXECUTION 305 rows selected.
Lots of hints to remember and to try out some day… nah, I’ll just run on modern Oracle versions and gather the stats properly ;-)
Listing Exadata storage cells and their configuration info from V$CELL_CONFIG
If you attended my Exadata hacking session today, you saw me using the cellver.sql script which lists some basic configuration info about the currently connected storage cells:
SQL> @exadata/cellver Show Exadata cell versions from V$CELL_CONFIG.... CELLNAME CELLSRV_VERSION FLASH_CACHE_MODE CPU_COUNT UPTIME KERNEL_VERSION MAKE_MODEL -------------------- -------------------- -------------------- ---------- -------------------- ------------------------------ -------------------------------------------------- 192.168.12.10 11.2.3.2.1 WriteBack 24 8 days, 2:07 2.6.32-400.11.1.el5uek Oracle Corporation SUN FIRE X4270 M2 SERVER SAS 192.168.12.11 11.2.3.2.1 WriteBack 24 8 days, 2:06 2.6.32-400.11.1.el5uek Oracle Corporation SUN FIRE X4270 M2 SERVER SAS 192.168.12.8 11.2.3.2.1 WriteBack 24 8 days, 2:06 2.6.32-400.11.1.el5uek Oracle Corporation SUN FIRE X4270 M2 SERVER SAS 192.168.12.9 11.2.3.2.1 WriteBack 24 8 days, 2:06 2.6.32-400.11.1.el5uek Oracle Corporation SUN FIRE X4270 M2 SERVER SAS
The output is pretty self-explanatory. One thing to note is that the CPU_COUNT is not the number of CPU cores, but just the number of “virtual” CPU threads presented to the OS, in this case 24 threads over 12 cores.
The script itself is simple, just extracting some XML values from the “CELL” type config records:
SQL> l 1 SELECT 2 cellname cv_cellname 3 , CAST(extract(xmltype(confval), '/cli-output/cell/releaseVersion/text()') AS VARCHAR2(20)) cv_cellVersion 4 , CAST(extract(xmltype(confval), '/cli-output/cell/flashCacheMode/text()') AS VARCHAR2(20)) cv_flashcachemode 5 , CAST(extract(xmltype(confval), '/cli-output/cell/cpuCount/text()') AS VARCHAR2(10)) cpu_count 6 , CAST(extract(xmltype(confval), '/cli-output/cell/upTime/text()') AS VARCHAR2(20)) uptime 7 , CAST(extract(xmltype(confval), '/cli-output/cell/kernelVersion/text()') AS VARCHAR2(30)) kernel_version 8 , CAST(extract(xmltype(confval), '/cli-output/cell/makeModel/text()') AS VARCHAR2(50)) make_model 9 FROM 10 v$cell_config -- gv$ isn't needed, all cells should be visible in all instances 11 WHERE 12 conftype = 'CELL' 13 ORDER BY 14* cv_cellname
I will add some more scripts to the exadata directory over the coming days.
Enjoy! :)
Alter session force parallel query doesn’t really force anything
Jonathan Lewis has already written about this behavior from the angle of PARALLEL hints.
I’m writing a similar article just because the word FORCE in the ALTER SESSION FORCE PARALLEL QUERY syntax. Force should mean that some behavior would always happen (when possible), right? Let’s test:
SQL> CREATE TABLE t AS SELECT * FROM dba_objects; Table created. SQL> CREATE INDEX i ON t(owner); Index created. SQL> @gts t Gather Table Statistics for table t... PL/SQL procedure successfully completed.
Now let’s “force” the parallel query in my session, run the query and check the execution plan:
SQL> ALTER SESSION FORCE PARALLEL QUERY PARALLEL 2;
Session altered.
SQL> SELECT SUM(object_id) FROM t WHERE owner LIKE 'S%';
SUM(OBJECT_ID)
--------------
979900956
SQL> @x
---------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 186 (100)|
| 1 | SORT AGGREGATE | | 1 | 12 | |
| 2 | TABLE ACCESS BY INDEX ROWID| T | 6741 | 80892 | 186 (0)|
|* 3 | INDEX RANGE SCAN | I | 6741 | | 18 (0)|
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OWNER" LIKE 'S%')
filter("OWNER" LIKE 'S%')
The output shows a regular, serial execution plan!
Hmm, let’s increase the “forced” parallelism from 2 to 3 and run exactly the same query again:
SQL> ALTER SESSION FORCE PARALLEL QUERY PARALLEL 3;
Session altered.
SQL> SELECT SUM(object_id) FROM t WHERE owner LIKE 'S%';
SUM(OBJECT_ID)
--------------
979900956
SQL> @x
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 128 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 12 | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 12 | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 12 | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 6741 | 80892 | 128 (0)| Q1,00 | PCWC | |
|* 6 | TABLE ACCESS FULL| T | 6741 | 80892 | 128 (0)| Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access(:Z>=:Z AND :Z<=:Z)
filter("OWNER" LIKE 'S%')
Now the query will get a parallel plan!
The reason for this behavior is that the FORCE parallel query syntax doesn’t really force Oracle to use a parallel plan, but rather just reduces optimizer cost estimates for full table scans (the higher the parallelism, the lower the FTS costs – Jonathan already has details about this in his blog entry, so I won’t replicate this). But the optimizer is still free to choose some other, non-parallel execution plan if that has a lower cost than the best parallel one!
So what happened above is that with “forced” parallel degree 2, the parallel full table scan plan must have had a higher cost than the serial index range scan (186), but once I increased the parallelism “factor” to 3, then the final cost of the parallel full table scan plan ended up being lower (128) than the best serial plan found.
This is a good example showing that both the PARALLEL hints and the FORCE PARALLEL session settings really just adjust a narrow set of optimizer cost computation inputs and don’t really fix the resulting execution plan. If you really want to fix an execution plan, you need to tie optimizer “hands” in every aspect with a full set of hints just like the stored profiles do. That way, even if there is a lower cost plan available, the optimizer doesn’t know about it as you’ve prohibited it from doing any calculations other than your hints direct it to.
Note that when testing this, your mileage may vary, depending on how much data you have in your test table (or rather in the optimizer stats for that table) plus system stats.
asqlmon.sql: SQL Monitoring-like execution plan line level drilldown into SQL response time
I don’t have much time for a thorough blog post, so I’ll just paste in an example output of my asqlmon.sql script, which uses ASH sql_plan_line columns for displaying where inside your execution plan response time has been spent. Why not just use Oracle’s own SQL Monitoring reports? Well, SQL monitoring is meant for “long running” queries, which are not executed very frequently. In other words, you can’t use SQL Monitoring for drilling down into your frequently executed OLTP-style SQL. I am copying my recent post to Oracle-L mailing list here too:
The main performance impact of the old GATHER_PLAN_STATISTICS / statistics_level = ALL instrumentation came from the fact that expensive timing (gettimeofday()) system calls were used for getting A-Times of row sources.
That’s why the _rowsource_statistics_sampfreq was introduced, to not have to get the timing every switch between row sources, but only at every 128th time (by default). This caused some interesting measurement (or rather extrapolation) errors where the parent row source seemingly used less time than all of its children combined. (By the way, depending on your platform and recency of it, lightweight “fast trap” system calls may be used for getting the current timestamp, which reduces the overhead somewhat – Solaris since long time ago and RHEL 5.4+ for example).
Anyway, the row source timing info in SQL Monitoring comes from ASH sql_plan_line samples, so this instrumentation is always enabled anyway – SQL Monitoring will just query ASH for these records. The other data in V$SQL_MONITOR views doesn’t cause huge overhead – it depends on your query, but a heavy logical IO nested loop (my lotslios.sql) consumed around 0.6% more CPU with MONITOR hint compared to running with NO_MONITOR hint. I tested it on a VM and ran it only a couple of times, so this might just be statistical error or due to some other background activity. Not significant overhead in other words.
Now this was about long running queries, but you should not go and enable SQL Monitoring for all your short (OLTP) queries with a hint now as the SQL Monitoring data has to be kept and updated in a shared pool memory structure – and if all of your 1000 sessions suddenly start updating their monitoring data for every execution of a SQL, you will end up waiting for Real-time plan statistics latch contention (and some CPU overhead because so frequent searching and purging of previous SQL execution’s monitoring data).
The asqlmon (ASH-based SQL Monitoring like report) assumes that the corresponding execution plan is still present in library cache, it will take SQL_ID and child_number as a parameter. It does not show only a single execution as the SQL Monitoring report does, but rather a plan-line level breakdown across all executions of that SQL/child captured in ASH data (you can change the WHERE clause to change the scope of the measurement):
SQL> @ash/asqlmon 6c45rgjx1myt6 0 SECONDS Activity Visual Line ID Parent ASQLMON_OPERATION SESSION EVENT OBJ_ALIAS_QBC_NAME ASQLMON_PREDICATES PROJECTION ---------- -------- ------------ ------- ------ ---------------------------------------------------------------------------------------------------- ------- ---------------------------------------------------------------- -------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- % | | 0 SELECT STATEMENT 5 10.4 % |# | 1 0 SORT AGGREGATE ON CPU [SEL$1] (#keys=0) COUNT(*)[22] 3 6.3 % |# | 2 1 COUNT STOPKEY ON CPU [F:]ROWNUM<=1000000000 % | | 3 2 MERGE JOIN CARTESIAN % | | 4 3 VIEW [DBA_SOURCE] DBA_SOURCE@SEL$1 [SET$1] % | | 5 4 UNION-ALL [SET$1] % | | 6 5 FILTER [SEL$335DD26A] [F:](("O"."TYPE#"<>4 AND "O"."TYPE#"<>5 AND "O"."TYPE#"<>7 AND "O"."TYPE#"<>8 AND "O"."TYPE#"<>9 AND "O"."TYPE#"<>10 AND "O"."TYPE#"<>11 AND "O"."TYPE#"<>12 AND "O"."TYPE#"<>13 AND "O"."TYPE#"<>14 AND "O"."TYPE#"<>22 AND "O"."TYPE#"<>87 AND "O"."TYPE#"<>88) OR BITAND("U"."SPARE1",16)=0 OR (INTERNAL_FUNCTION("O"."TYPE#") AND ((SYS_CONTEXT('userenv','current_edition_name')='ORA$BASE' AND "U"."TYPE#"<>2) OR ("U"."TYPE#"=2 AND "U"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id'))) OR IS NOT NULL))) % | | 7 6 HASH JOIN [A:] "O"."OBJ#"="S"."OBJ#" (#keys=1) "O"."OBJ#"[NUMBER,22], "U"."SPARE2"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 8 7 HASH JOIN [A:] "O"."SPARE3"="U"."USER#" (#keys=1) "U"."SPARE2"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "O"."OBJ#"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 9 8 INDEX FULL SCAN [I_USER2] U@SEL$2 [SEL$335DD26A] "U"."USER#"[NUMBER,22] % | | 10 8 HASH JOIN [A:] "O"."OWNER#"="U"."USER#" (#keys=1) "U"."SPARE2"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "O"."OBJ#"[NUMBER,22], "O"."SPARE3"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 11 10 INDEX FULL SCAN [I_USER2] U@SEL$3 [SEL$335DD26A] "U"."USER#"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "U"."SPARE2"[NUMBER,22] 1 2.1 % | | 12 10 INDEX FAST FULL SCAN [I_OBJ2] ON CPU O@SEL$3 [SEL$335DD26A] [F:](("O"."TYPE#"=13 AND "O"."SUBNAME" IS NULL) OR INTERNAL_FUNCTION("O"."TYPE#")) "O"."OBJ#"[NUMBER,22], "O"."OWNER#"[NUMBER,22], "O"."TYPE#"[NUMBER,22], "O"."SPARE3"[NUMBER,22] % | | 13 7 INDEX FAST FULL SCAN [I_SOURCE1] S@SEL$2 [SEL$335DD26A] "S"."OBJ#"[NUMBER,22] % | | 14 6 NESTED LOOPS [SEL$5] % | | 15 14 INDEX SKIP SCAN [I_USER2] U2@SEL$5 [SEL$5] [A:] "U2"."TYPE#"=2 AND "U2"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id')) "U2"."USER#"[NUMBER,22] [F:]("U2"."TYPE#"=2 AND "U2"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id'))) % | | 16 14 INDEX RANGE SCAN [I_OBJ4] O2@SEL$5 [SEL$5] [A:] "O2"."DATAOBJ#"=:B1 AND "O2"."TYPE#"=88 AND "O2"."OWNER#"="U2"."USER#" % | | 17 5 NESTED LOOPS [SEL$68B588A0] % | | 18 17 NESTED LOOPS "O"."SPARE3"[NUMBER,22] % | | 19 18 NESTED LOOPS "O"."OWNER#"[NUMBER,22], "O"."TYPE#"[NUMBER,22], "O"."SPARE3"[NUMBER,22] % | | 20 19 FIXED TABLE FULL [X$JOXSCD] S@SEL$6 [SEL$68B588A0] "S"."JOXFTOBN"[NUMBER,22] % | | 21 19 TABLE ACCESS BY INDEX ROWID [OBJ$] O@SEL$7 [SEL$68B588A0] "O"."OWNER#"[NUMBER,22], "O"."TYPE#"[NUMBER,22], "O"."SPARE3"[NUMBER,22] % | | 22 21 INDEX RANGE SCAN [I_OBJ1] O@SEL$7 [SEL$68B588A0] [A:] "O"."OBJ#"="S"."JOXFTOBN" AND "O"."TYPE#"=28 [F:]"O"."TYPE#"=28 "O".ROWID[ROWID,10], "O"."OWNER#"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 23 18 INDEX RANGE SCAN [I_USER2] U@SEL$7 [SEL$68B588A0] [A:] "O"."OWNER#"="U"."USER#" [F:](("O"."TYPE#"<>4 AND "O"."TYPE#"<>5 AND "O"."TYPE#"<>7 AND "O"."TYPE#"<>8 AND "O"."TYPE#"<>9 AND "O"."TYPE#"<>10 AND "O"."TYPE#"<>11 AND "O"."TYPE#"<>12 AND "O"."TYPE#"<>13 AND "O"."TYPE#"<>14 AND "O"."TYPE#"<>22 AND "O"."TYPE#"<>87 AND "O"."TYPE#"<>88) OR BITAND("U"."SPARE1",16)=0) % | | 24 17 INDEX RANGE SCAN [I_USER2] U@SEL$6 [SEL$68B588A0] [A:] "O"."SPARE3"="U"."USER#" 39 81.3 % |######## | 25 3 BUFFER SORT ON CPU (#keys=0) % | | 26 25 VIEW [DBA_OBJECTS] DBA_OBJECTS@SEL$1 [SET$2] % | | 27 26 UNION-ALL [SET$2] % | | 28 27 FILTER [SEL$18BE6699] [F:](("O"."TYPE#"<>4 AND "O"."TYPE#"<>5 AND "O"."TYPE#"<>7 AND "O"."TYPE#"<>8 AND "O"."TYPE#"<>9 AND "O"."TYPE#"<>11 AND "O"."TYPE#"<>12 AND "O"."TYPE#"<>13 AND "O"."TYPE#"<>14 AND "O"."TYPE#"<>22 AND "O"."TYPE#"<>87 AND "O"."TYPE#"<>88) OR BITAND("U"."SPARE1",16)=0 OR (INTERNAL_FUNCTION("O"."TYPE#") AND ((SYS_CONTEXT('userenv','current_edition_name')='ORA$BASE' AND "U"."TYPE#"<>2) OR ("U"."TYPE#"=2 AND "U"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id'))) OR IS NOT NULL))) % | | 29 28 HASH JOIN [A:] "O"."SPARE3"="U"."USER#" (#keys=1) "U"."SPARE2"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "O"."OBJ#"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 30 29 INDEX FULL SCAN [I_USER2] U@SEL$10 [SEL$18BE6699] "U"."USER#"[NUMBER,22] % | | 31 29 HASH JOIN [A:] "O"."OWNER#"="U"."USER#" (#keys=1) "U"."SPARE2"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "O"."OBJ#"[NUMBER,22], "O"."SPARE3"[NUMBER,22], "O"."TYPE#"[NUMBER,22] % | | 32 31 INDEX FULL SCAN [I_USER2] U@SEL$12 [SEL$18BE6699] "U"."USER#"[NUMBER,22], "U"."TYPE#"[NUMBER,22], "U"."SPARE1"[NUMBER,22], "U"."SPARE2"[NUMBER,22] % | | 33 31 TABLE ACCESS FULL [OBJ$] O@SEL$12 [SEL$18BE6699] [F:]("O"."TYPE#"<>10 AND BITAND("O"."FLAGS",128)=0 AND "O"."NAME"<>'_NEXT_OBJECT' AND "O"."OBJ#"[NUMBER,22], "O"."OWNER#"[NUMBER,22], "O"."TYPE#"[NUMBER,22], "O"."SPARE3"[NUMBER,22] "O"."NAME"<>'_default_auditing_options_' AND "O"."LINKNAME" IS NULL) % | | 34 28 NESTED LOOPS [SEL$14] % | | 35 34 INDEX SKIP SCAN [I_USER2] U2@SEL$14 [SEL$14] [A:] "U2"."TYPE#"=2 AND "U2"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id')) "U2"."USER#"[NUMBER,22] [F:]("U2"."TYPE#"=2 AND "U2"."SPARE2"=TO_NUMBER(SYS_CONTEXT('userenv','current_edition_id'))) % | | 36 34 INDEX RANGE SCAN [I_OBJ4] O2@SEL$14 [SEL$14] [A:] "O2"."DATAOBJ#"=:B1 AND "O2"."TYPE#"=88 AND "O2"."OWNER#"="U2"."USER#" % | | 37 27 NESTED LOOPS [SEL$15] % | | 38 37 INDEX FULL SCAN [I_LINK1] L@SEL$15 [SEL$15] "L"."OWNER#"[NUMBER,22] % | | 39 37 INDEX RANGE SCAN [I_USER2] U@SEL$15 [SEL$15] [A:] "L"."OWNER#"="U"."USER#" 40 rows selected.
The output is deliberately very wide – as I like to have relevant info to a plan line on the same text line visually. I navigate around such plans by visually identifying where most of the time was spent (Activity% and Visual columns), then double or triple-click that line in my terminal window to highlight the whole line (it becomes like a marker) and then scroll left-right to see more info about it, such as the filter conditions and projection etc.
Note that this script does not give you all the info that a SQL Monitoring report (or the old-fashioned SQL Plan Profiling would give), like A-rows and Starts, because ASH just doesn’t provide data at this scope. But again, this approach works with frequently executed OLTP-style queries and does not require you to enable any session settings or add hints for getting quick overview of on which execution plan line does your SQL spend the most of its time.
Update: I also have a RAC-aware / GV$-using gasqlmon.sql script, but I haven’t kept it up to date for a while, so it might return incorrect output at the moment (there are some notable challenges as the child# of interest may not be the same on all instances, so for simplicity I look into a single instance at a time – non-parallel SQL tuning rarely requires a whole cluster overview in one script).
Enkitec’s Exadata Smart Flash Cache Hacking Session by Tanel Poder! (free stuff!!!)
We recently received our 3rd Exadata machine into Enkitec’s exalab. Now we have a V2, X2 and X3 there, in addition to ODA, Big Data Appliance (which comes with a beer-holder built in!) and an Exalytics box! So you understand why Karl Arao is so excited about it :-)
This occasion demands that we hack the hell out of all this kit soon! So, let’s have another (super-secret) hacking session!
This time, let’s see how the Exadata Smart Flash Cache works! (both for reads and writes). Note that we won’t cover Smart Flash Logging in this session (otherwise we’ll end up spending half a day on it :)
This is another 1-hour hacking session, so save at least 2 hours in your calendar!
- When: Thursday, March 21, 2013 9:00 AM – 11:00 AM PDT
- Where: https://www1.gotomeeting.com/register/229321321
- Price: Free
See you soon!
P.S. I uploaded the Part 1 of my previous hacking session about Oracle SQL plan execution to enkitec.tv (The part 2 will follow hopefully today)
Secret Hacking Session: Find Out How Oracle SQL Plans Are Really Executed!
Yes, I am going to run a yet another secret hacking session next week!
- Title: Secret Hacking Session: How Oracle SQL Plans Are Really Executed!
- Date: Tuesday, March 5, 2013
- Time: 10:00 AM – 12:00 PM PST
Registration
Description
This is not a yet another talk about how to extract and display execution plans.
It is also not a theoretical talk about how to read execution plans in general.
You should know this stuff already :)
In this extremely low level hacking session where we’ll use various tracing, stack sampling and even DTrace to trace SQL plan execution, where exactly do the logical and physical IOs happen and how does the data flow up the execution plan tree.
As usual, my secret hacking sessions are not formal training events with slides and structure, but as the name says, unstructured, fun hacking sessions where most of the time is spent in Oracle sqlplus and OS shell!
We’ll be drilling down into low-level internals deep and fast – so this session is not for faint-hearted! ;)
This is a one hour hacking session, which more than likely will take two hours.
I will be showing only a couple of random slides, if you want to read my old slide deck on this topic, download it here (plus a few helpful blog articles related to SQL execution plans and internals):
- How Oracle SQL Plans Are Really Executed – slides
- Tracing Oracle SQL Plan Execution with DTrace – blog entry
- What the heck is the INTERNAL_FUNCTION in execution plan predicate section? – blog entry
- Scripts for showing execution plans via plain SQL and also in Oracle 9i – blog entry
- Explain Plan For command may show you the wrong execution plan – blog entry
- Understanding Oracle execution plans with os_explain – blog entry
Also, if you care about systematic SQL tuning – then my Advanced Oracle SQL Tuning v2.0 online seminar starts on 11. March!
See you on Tuesday!
ExaSnapper 0.7 beta download and the hacking session videos
Thank you for attending the Exadata Snapper (ExaSnapper) hacking session!
I have split the recording of this session into 3 pieces and uploaded to enkitec.tv. The ExaSnapper v0.7 beta that I demoed is also available now at enkitec.com (registration needed). See the links below.
For quick reference, here’s the syntax of running ExaSnapper – there are two modes, one is the before/after capture (think Tom Kyte’s runstats, but for exadata metrics) and the other is more like a DBA-monitoring mode, where you can just measure a few seconds worth of a long-running query runtime and get the IO and efficiency figures from there.
Here’s an excerpt from the install script documentation section:
-- Usage: Take a snapshot of a running session (use QC SID if PX):
--
-- a) Monitor a running query - "DBA mode"
--
-- SELECT * FROM TABLE(exasnap.display_sid(, [snap_seconds], [detail_level]));
--
-- The SID argument can be just a number (SID in local instance) or a remote SID with
-- @instance after it (like '123@4')
--
-- SELECT * FROM TABLE(exasnap.display_sid(123));
-- SELECT * FROM TABLE(exasnap.display_sid('123@4', p_detail=>'%');
--
-- b) Take Before & After snapshots of a query execution - "Developer Mode"
--
-- 1) SELECT exasnap.begin_snap(123) FROM dual;
-- or
-- EXEC :begin_snap_id := exasnap.begin_snap(123);
--
-- 2) Run your query, wait until it finishes (or CTRL+C)
--
-- 3) SELECT exasnap.end_snap(123) FROM dual;
-- or
-- EXEC :end_snap_id := exasnap.end_snap(123);
--
-- 4) SELECT * FROM TABLE(exasnap.display_snap(:begin_snap_id, :end_snap_id, '%'));
--
One slide illustrating the idea of ExaSnapper:
One screenshot of ExaSnapper output:
Watch the videos to see more!
Videos: Slides: ExaSnapper install script:Also, if you’re serious about learning more about Exadata, from The Exadata Experts, check out our training pages , I’m running my Advanced Exadata Performance seminar the first time in May :-)
Peeking into Linux kernel-land using /proc filesystem for quick’n'dirty troubleshooting
This blog entry is about modern Linuxes. In other words RHEL6 equivalents with 2.6.3x kernels and not the ancient RHEL5 with 2.6.18 kernel (wtf?!), which is the most common in enterprises unfortunately. And no, I’m not going to use kernel debuggers or SystemTap scripts here, just plain old “cat /proc/PID/xyz” commands against some useful /proc filesystem entries.
Troubleshooting a “slow” processHere’s one systematic troubleshooting example I reproduced in my laptop. A DBA was wondering why their find command had been running “much slower”, without returning any results for a while. Knowing the environment, we had a hunch, but I got asked about what would be the systematic approach for troubleshooting this – already ongoing – problem right now.
Luckily the system was running OEL6, so had a pretty new kernel. Actually the 2.6.39 UEK2.
So, let’s do some troubleshooting. First let’s see whether that find process is still alive:
[root@oel6 ~]# ps -ef | grep find root 27288 27245 4 11:57 pts/0 00:00:01 find . -type f root 27334 27315 0 11:57 pts/1 00:00:00 grep find
Yep it’s there – PID 27288 (I’ll use that pid throughout the troubleshooting example).
Let’s start from the basics and take a quick look what’s the bottleneck for this process – if it’s not blocked by anything (for example reading everything it needs from cache) it should be 100% on CPU. If it’s bottlenecked by some IO or contention issues, the CPU usage should be lower – or completely 0%.
[root@oel6 ~]# top -cbp 27288 top - 11:58:15 up 7 days, 3:38, 2 users, load average: 1.21, 0.65, 0.47 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2026460k total, 1935780k used, 90680k free, 64416k buffers Swap: 4128764k total, 251004k used, 3877760k free, 662280k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27288 root 20 0 109m 1160 844 D 0.0 0.1 0:01.11 find . -type f
Top tells me this process is either 0% on CPU or very close to zero percent (so it gets rounded to 0% in the output). There’s an important difference though, as a process that is completely stuck, not having a chance of getting onto CPU at all vs. a process which is getting out of its wait state every now and then (for example some polling operation times out every now and then and thee process chooses to go back to sleep). So, top on Linux is not a good enough tool to show that difference for sure – but at least we know that this process is not burning serious amounts of CPU time.
Let’s use something else then. Normally when a process seems to be stuck like that (0% CPU usually means that the process is stuck in some blocking system call – which causes the kernel to put the process to sleep) I run strace on that process to trace in which system call the process is currently stuck. Also if the process is actually not completely stuck, but returns from a system call and wakes up briefly every now and then, it would show up in strace (as the blocking system call would complete and be entered again a little later):
[root@oel6 ~]# strace -cp 27288 Process 27288 attached - interrupt to quit ^C ^Z [1]+ Stopped strace -cp 27288 [root@oel6 ~]# kill -9 %% [1]+ Stopped strace -cp 27288 [root@oel6 ~]# [1]+ Killed strace -cp 27288
Oops, the strace command itself got hung too! It didn’t print any output for a long time and didn’t respond to CTRL+C, so I had to put it into background with CTRL+Z and kill it from there. So much for easy diagnosis.
Let’s try pstack then (on Linux, pstack is just a shell script wrapper around the GDB debugger). While pstack does not see into kernel-land, it will still give us a clue about which system call was requested (as usually there’s a corresponding libc library call in the top of the displayed userland stack):
[root@oel6 ~]# pstack 27288 ^C ^Z [1]+ Stopped pstack 27288 [root@oel6 ~]# kill %% [1]+ Stopped pstack 27288 [root@oel6 ~]# [1]+ Terminated pstack 27288
Pstack also got stuck without returning anything!
So we still don’t know whether our process is 100% (hopelessly) stuck or just 99.99% stuck (the process keeps going back to sleep) – and where it’s stuck.
Ok, where else to look? There’s one more commonly available thing to check – the process status and WCHAN fields, which can be reported via good old ps (perhaps I should have ran this command earlier to make sure this process isn’t a zombie by now):
[root@oel6 ~]# ps -flp 27288 F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 D root 27288 27245 0 80 0 - 28070 rpc_wa 11:57 pts/0 00:00:01 find . -type f
You should run ps multiple times in a row to make sure that the process is still in the same state (you don’t want to get misled by a single sample taken at an unlucky time), but I’m displaying it here only once for brevity.
The process is in state D (uninterruptible sleep) which is usually disk IO related (as the ps man pages also say). And the WCHAN (the function which caused this sleep/wait in this process) field is a bit truncated. I could use a ps option (read the man) to print it out a bit wider, but as this info comes from the proc filesystem anyway, let’s query it directly from the source (again, it would be a good idea to sample this multiple times as we are not sure yet whether our process is completely stuck or just mostly sleeping):
[root@oel6 ~]# cat /proc/27288/wchan rpc_wait_bit_killable
Hmm… this process is waiting for some RPC call. RPC usually means that the process is talking to other processes (either in the local server or even remote servers). But we still don’t know why.
Is there any movement or is the process completely stuck?Before we go on to the “meat” of this article, let’s figure out whether this process is completely stuck or not. The /proc/PID/status can tell us that on modern kernels. I have highlighted the interesting bits for our purpose:
[root@oel6 ~]# cat /proc/27288/status Name: find State: D (disk sleep) Tgid: 27288 Pid: 27288 PPid: 27245 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 1 2 3 4 6 10 VmPeak: 112628 kB VmSize: 112280 kB VmLck: 0 kB VmHWM: 1508 kB VmRSS: 1160 kB VmData: 260 kB VmStk: 136 kB VmExe: 224 kB VmLib: 2468 kB VmPTE: 88 kB VmSwap: 0 kB Threads: 1 SigQ: 4/15831 SigPnd: 0000000000040000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 9950 nonvoluntary_ctxt_switches: 17104
The process is in the state D – Disk Sleep (uninterruptible sleep). And look into the number of voluntary_ctxt_switches and nonvoluntary_ctxt_switches – this tells you how many times the process was taken off CPU (or put back). Then run the same command again a few seconds later and see if those numbers increase. In my case these numbers did not increase, thus I could conclude that the process was completely stuck (well, at least during the few seconds between these commands). So I can be more confident now that this process is completely stuck (and it’s not just flying under the radar by burning 0.04% of CPU all time).
By the way, there are two more locations where to get the context switch counts (and the 2nd one works on ancient kernels as well):
[root@oel6 ~]# cat /proc/27288/sched find (27288, #threads: 1) --------------------------------------------------------- se.exec_start : 617547410.689282 se.vruntime : 2471987.542895 se.sum_exec_runtime : 1119.480311 se.statistics.wait_start : 0.000000 se.statistics.sleep_start : 0.000000 se.statistics.block_start : 617547410.689282 se.statistics.sleep_max : 0.089192 se.statistics.block_max : 60082.951331 se.statistics.exec_max : 1.110465 se.statistics.slice_max : 0.334211 se.statistics.wait_max : 0.812834 se.statistics.wait_sum : 724.745506 se.statistics.wait_count : 27211 se.statistics.iowait_sum : 0.000000 se.statistics.iowait_count : 0 se.nr_migrations : 312 se.statistics.nr_migrations_cold : 0 se.statistics.nr_failed_migrations_affine: 0 se.statistics.nr_failed_migrations_running: 96 se.statistics.nr_failed_migrations_hot: 1794 se.statistics.nr_forced_migrations : 150 se.statistics.nr_wakeups : 18507 se.statistics.nr_wakeups_sync : 1 se.statistics.nr_wakeups_migrate : 155 se.statistics.nr_wakeups_local : 18504 se.statistics.nr_wakeups_remote : 3 se.statistics.nr_wakeups_affine : 155 se.statistics.nr_wakeups_affine_attempts: 158 se.statistics.nr_wakeups_passive : 0 se.statistics.nr_wakeups_idle : 0 avg_atom : 0.041379 avg_per_cpu : 3.588077 nr_switches : 27054 nr_voluntary_switches : 9950 nr_involuntary_switches : 17104 se.load.weight : 1024 policy : 0 prio : 120 clock-delta : 72
Here you need to look into the nr_switches number (which equals nr_voluntary_switches + nr_involuntary_switches).
The total nr_switches is 27054 in the above output and this is also what the 3rd field in /proc/PID/schedstat shows:
[root@oel6 ~]# cat /proc/27288/schedstat 1119480311 724745506 27054
And it doesn’t increase…
Peeking into kernel-land using /proc filesystemSo, it looks like our process is pretty stuck :) Strace and pstack were useless. They use ptrace() system call for attaching to the process and peeking into its memory, but as the process was hopelessly stuck, very likely in some system call, so I guess the ptrace() call got hung itself. (BTW, I once tried stracing the strace process itself when attaching to a target process and it crashed the target. You have been warned :)
So, how to see in which system call are we stuck – without strace or pstack? Luckily I was running on a modern kernel – say hi to /proc/PID/syscall!
[root@oel6 ~]# cat /proc/27288/syscall 262 0xffffffffffffff9c 0x20cf6c8 0x7fff97c52710 0x100 0x100 0x676e776f645f616d 0x7fff97c52658 0x390e2da8ea
Ok WTF am I supposed to do with that?
Well, these numbers usually stand for something. If it’s “0xAVeryBigNumber” then it’s usually a memory address (and pmap-like tools could be used where they point), but if it’s a small number then perhaps it’s an index into some array – like open file descriptor array (which you can read from /proc/PID/fd) or in this case, as we are dealing with system calls – it’s the system call number where your process is currently in! So, this process is stuck in system call #262!
Note that the system call numbers may differ between OS type, OS version and platform, so you should read it from the proper .h file on your flavor of OS. Usually searching for “syscall*” from /usr/include is a good starting point. On my version and platform (64bit) of Linux the system calls are defined here in /usr/include/asm/unistd_64.h:
[root@oel6 ~]# grep 262 /usr/include/asm/unistd_64.h #define __NR_newfstatat 262
Here you go! The syscall 262 is something called newfstatat. Just run man to find out what it is. Here’s a hint regarding system call names – if man page doesn’t find a call, then try it without possible suffixes and prefixes (for example “man pread” instead of “man pread64″) – in this case, run man without the “new” prefix – man fstatat. Or just google.
Anyway, this system call “new-fstat-at” allows you to read file properties pretty much like the usual stat system call. And we are stuck in this file metadata reading operation. So we’ve gotten one step further. But we still don’t know why are we stuck there?
Well, Say Hello To My Little Friend /proc/PID/stack, which allows you to read kernel stack backtraces of a process by just cat’ing a proc file!!!
[root@oel6 ~]# cat /proc/27288/stack [] rpc_wait_bit_killable+0x24/0x40 [sunrpc] [] __rpc_execute+0xf5/0x1d0 [sunrpc] [] rpc_execute+0x43/0x50 [sunrpc] [] rpc_run_task+0x75/0x90 [sunrpc] [] rpc_call_sync+0x42/0x70 [sunrpc] [] nfs3_rpc_wrapper.clone.0+0x35/0x80 [nfs] [] nfs3_proc_getattr+0x47/0x90 [nfs] [] __nfs_revalidate_inode+0xcc/0x1f0 [nfs] [] nfs_revalidate_inode+0x36/0x60 [nfs] [] nfs_getattr+0x5f/0x110 [nfs] [] vfs_getattr+0x4e/0x80 [] vfs_fstatat+0x70/0x90 [] sys_newfstatat+0x24/0x50 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff
The top function is where in kernel code we are stuck right now – which is exactly what the WCHAN output showed us (note that actually there are some more functions on top of it, such as the kernel schedule() function which puts processes to sleep or on CPU as needed, but these are not shown here probably because they’re a result of this wait condition itself, not the cause).
Thanks to the full kernel-land stack of this task we can read it from bottom up to understand how did we end up calling this rpc_wait_bit_killable function in the top, which ended up calling the scheduler and put us into sleep mode.
The system_call_fastpath in the bottom is a generic kernel system call handler function which has called the kernel code for that newfstatat syscall we issued (sys_newfstatat). And then, continuing to go upwards through the “child” functions we end up seeing a bunch of NFS functions in the stack! This is a 100% undeniable proof that we are somewhere under NFS codepath. I’m not saying “in NFS codepath” as when you continue reading upwards past the NFS functions, you see that the topmost NFS function has in turn called some RPC function (rpc_call_sync) for talking to some other process – in this case probably the [kworker/N:N], [nfsiod], [lockd] or [rpciod] kernel IO threads. And we are never getting a reply from those threads for some reason (the usual suspect would be dropped network connection, packets or just network connectivity issues).
To see whether any of the helper threads are stuck on network-related code, you could sample their kernel stacks too, although the kworkers do much more than just NFS RPC communication. During a separate experiment (just copying large files over NFS), I have caught one of the kworkers waiting in networking code:
[root@oel6 proc]# for i in `pgrep worker` ; do ps -fp $i ; cat /proc/$i/stack ; done UID PID PPID C STIME TTY TIME CMD root 53 2 0 Feb14 ? 00:04:34 [kworker/1:1] [] __cond_resched+0x2a/0x40 [] lock_sock_nested+0x35/0x70 [] tcp_sendmsg+0x29/0xbe0 [] inet_sendmsg+0x48/0xb0 [] sock_sendmsg+0xef/0x120 [] kernel_sendmsg+0x41/0x60 [] xs_send_kvec+0x8e/0xa0 [sunrpc] [] xs_sendpages+0x173/0x220 [sunrpc] [] xs_tcp_send_request+0x5d/0x160 [sunrpc] [] xprt_transmit+0x83/0x2e0 [sunrpc] [] call_transmit+0xa8/0x130 [sunrpc] [] __rpc_execute+0x66/0x1d0 [sunrpc] [] rpc_async_schedule+0x15/0x20 [sunrpc] [] process_one_work+0x13e/0x460 [] worker_thread+0x17c/0x3b0 [] kthread+0x96/0xa0 [] kernel_thread_helper+0x4/0x10
It should be possible to enable kernel tracing for knowing which exact kernel thread communicates with other kernel threads, but I won’t go in there in this post – as it’s supposed to be a practical (and simple :) troubleshooting exercise!
Diagnosis and “fix”Anyway, thanks to very easy kernel stack sampling in modern Linux kernels (I don’t know exact version when it was introduced) we managed to systematically figure out where the find command was stuck – in NFS code in the Linux kernel. And when you have NFS-related hangs, the most usual suspects are network issues. If you are wondering how did I reproduce this problem, I mounted a NFS volume from a VM, started the find command and then just suspended the VM. This produces similar symptoms as a network (configuration, firewall) issue where a connection is silently dropped without notifying the TCP endpoints or packets just don’t go through for some reason.
As the top function listed in the stack was one of the “killable” safe-to-kill functions (rpc_wait_bit_killable) then we can kill it with kill -9:
[root@oel6 ~]# ps -fp 27288 UID PID PPID C STIME TTY TIME CMD root 27288 27245 0 11:57 pts/0 00:00:01 find . -type f [root@oel6 ~]# kill -9 27288 [root@oel6 ~]# ls -l /proc/27288/stack ls: cannot access /proc/27288/stack: No such file or directory [root@oel6 ~]# ps -fp 27288 UID PID PPID C STIME TTY TIME CMD [root@oel6 ~]#
The process is gone.
Poor-man’s kernel thread profilingNote that as the /proc/PID/stack looks like just a plain text proc file, you can do poor-man’s stack profiling also on the kernel threads! This is an example of sampling the current system call and the kernel stack (if in system call) and aggregating it into a semi-hierarchical profile in a poor-man’s way:
[root@oel6 ~]# export LC_ALL=C ; for i in {1..100} ; do cat /proc/29797/syscall | awk '{ print $1 }' ; cat /proc/29797/stack | /home/oracle/os_explain -k ; usleep 100000 ; done | sort -r | uniq -c
69 running
1 ffffff81534c83
2 ffffff81534820
6 247
25 180
100 0xffffffffffffffff
1 thread_group_cputime
27 sysenter_dispatch
3 ia32_sysret
1 task_sched_runtime
27 sys32_pread
1 compat_sys_io_submit
2 compat_sys_io_getevents
27 sys_pread64
2 sys_io_getevents
1 do_io_submit
27 vfs_read
2 read_events
1 io_submit_one
27 do_sync_read
1 aio_run_iocb
27 generic_file_aio_read
1 aio_rw_vect_retry
27 generic_file_read_iter
1 generic_file_aio_read
27 mapping_direct_IO
1 generic_file_read_iter
27 blkdev_direct_IO
27 __blockdev_direct_IO
27 do_blockdev_direct_IO
27 dio_post_submission
27 dio_await_completion
6 blk_flush_plug_list
This gives you a very rough sample of where in kernel this process spent its time – if at all. The system call numbers are separated, in bold, above – “running” means that this process was in userland (not in a system call) during the sample. So, 69% of the time during the sampling, this process was running userland code. 25% of time in syscall #180 (nfsservctl on my system) and 6% in syscall #247 (waitid).
There are two more “functions” seen in this output – but for some reason they haven’t been properly translated to a function name. Well, this address must stand for something, so let’s try our luck manually:
[root@oel6 ~]# cat /proc/kallsyms | grep -i ffffff81534c83 ffffffff81534c83 t ia32_sysret
So it looks that these samples are about a 32-bit architecture-compatible system call return function – but as this function isn’t a system call itself (just an internal helper function), perhaps that’s why the /proc/stack routine didn’t translate it. Perhaps these addresses show up because there’s no “read consistency” on the /proc views, when owning threads modify these memory structures and entries, the reading threads may sometimes read data “in-flux”.
Let’ check the other address too:
[root@oel6 ~]# cat /proc/kallsyms | grep -i ffffff81534820 [root@oel6 ~]#
Nothing? Well, the troubleshooting doesn’t have to stop yet – let’s see if there’s anything interesting around that address. I just removed a couple of trailing characters from the address:
[root@oel6 ~]# cat /proc/kallsyms | grep -i ffffff815348 ffffffff8153480d t sysenter_do_call ffffffff81534819 t sysenter_dispatch ffffffff81534847 t sysexit_from_sys_call ffffffff8153487a t sysenter_auditsys ffffffff815348b9 t sysexit_audit
Looks like the sysenter_dispatch function starts just 1 byte before the original address displayed by /proc/PID/stack. So we had possibly already executed the one byte (possibly a NOP operation left in there for dynamic tracing probe traps). But nevertheless, looks like these stack samples were in the sysenter_dispatch function, which isn’t a system call itself, but a system call helper function.
Note that there are different types of stack samplers – the Linux Perf, Oprofile tools and Solaris DTrace’s profile provider do sample the instruction pointer (EIP on 32bit intel or RIP on x64) and stack pointer (ESP on 32bit and RSP on 64bit) registers of the currently running thread on CPU and walk the stack pointers backwards from there. Therefore, these tools only show threads which happen to be on CPU when sampling!!! This is of course perfect, when troubleshooting high CPU utilization issues, but not useful at all when troubleshooting hung processes or too long process sleeps/waits.
Tools like pstack on Linux,Solaris,HP-UX, procstack (on AIX), ORADEBUG SHORT_STACK and just reading the /proc/PID/stack pseudofile provide a good addition (not replacement) to the CPU profiling tools – as they access the probed process memory regardless of their CPU scheduling state and read the stack from there. If the process is sleeping, not on CPU, the starting point of the stack can be read from the stored context – which is saved to kernel memory by OS scheduler during the context switch.
Of course, the CPU event profiling tools can often do much more than just a pstack, OProfile, Perf and even DTrace’s CPC provider (in Solaris11) can set up and sample CPU internal performance counters to estimate things like the amount of CPU cycles stalled waiting for main memory, amount of L1/L2 cache misses, etc. Better read what Kevin Closson has to say on these topics: (Perf, Oprofile)
Enjoy :-)
Even more Snapper – v4.03 now works in SQL Developer too!
If you already downloaded snapper v4, then better re-download it again as the v4.03 also runs in SQL Developer!
You might also want to check how to enable the DBMS_OUTPUT display in SQL Developer post by “That” Jeff Smith :-)
Update: Niall Litchfield pointed out that the “set serverout on” command was ignored on SQL Developer because I was using the short syntax (serverout instead of serveroutput). Once I changed this, the SQL Developer doesn’t print the warning anymore and there’s no need to explicitly open up the separate DBMS_OUTPUT. I have uploaded Snapper v4.04 with this fix included. Thanks Niall!
Also, thanks go to:
- Hans-Peter Sloot – for building the first cut of RAC support into a previous Snapper version
- Ronald Rood – for adding the customizations for SQL Developer support into a previous Snapper version
Note that the manual before/after snapshots don’t work inside SQL Developer right now, will try to fix this later …
Manual “before” and “after” snapshot support in Snapper v4
Snapper used to require access to DBMS_LOCK, so it could sleep for X seconds between the “before” and “after” performance data snapshots. Now it is possible to get away without using DBMS_LOCK. Instead you will run Snapper twice, once for taking the “before” snapshot, then run your workload and then run Snapper again for taking the “after” snapshot and print the output.
So, the usual way of running snapper is this:
@snapper4 all 5 1 152
This would take 1 5-second performance snapshot SID 152′s V$ views.
With Snapper4 you can use the old way or just add BEGIN or END keywords to the 1st parameter, like this:
@snapper4 all,begin 5 1 152
Whenever using the begin/end syntax, the 2nd and 3rd parameter specifying snapshot duration and count (“5 1″) are just ignored, but they still have to be there right now as snapper requires 4 parameters when called (I’m thinking about changing this in V4.1).
So, let’s try it:
SQL> @snapper4 all,begin 5 1 152 Sampling SID 152 with interval 5 seconds, taking 1 snapshots... SP2-0552: Bind variable "SNAPPER" not declared.
Oops! There’s some error!
In this version of Snapper, if you want to use the before/after manual snapshot mode, you need to run one more command in your session before starting (you need to do it only once when logging in, not every time you run snapper in your session):
SQL> VAR SNAPPER REFCURSOR SQL> SQL> @snapper4 all,begin 5 1 152 Sampling SID 152 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.02 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :) Taking BEGIN sample ... PL/SQL procedure successfully completed.
The REFCURSOR trick is how I manage to store the “before” snapshot data even though the Snapper anonymous block exits and gives you back control. See the code for more details. I’ve figured out a way to do this in the snapper script itself (it’s not trivial), but haven’t gotten to this yet. Coming soon :)
Anyway, now as we have taken the before snapshot, we can run whatever commands in the session(s) we are measuring (it doesn’t have to be a single session, you can use the usual snapper syntax to pick which sessions to measure).
Once you are ready to take the end snapshot and print the deltas, run this:
SQL> @snapper4 all,end 5 1 152
Sampling SID 152 with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.02 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
152, SYS , STAT, opened cursors cumulative , 13690, 1.35k, , , , , .7 per execution
152, SYS , STAT, opened cursors current , 1, .1, , , , , .05*10^-3 per execution
152, SYS , STAT, user commits , 18, 1.78, , , , , .93*10^-3 per execution
152, SYS , STAT, recursive calls , 234274, 23.11k, , , , , 12.05 per execution
152, SYS , STAT, recursive cpu usage , 848, 83.64, , , , , .04 per execution
152, SYS , STAT, session logical reads , 267980, 26.43k, , , , , 13.78 per execution
... some output removed for brevity ...
152, SYS , WAIT, control file sequential read , 1078, 106.33us, .0%, [ ], 130, 12.82, 8.29us average wait
152, SYS , WAIT, log file sync , 10969, 1.08ms, .1%, [ ], 15, 1.48, 731.27us average wait
152, SYS , WAIT, db file sequential read , 444277, 43.82ms, 4.4%, [W ], 1900, 187.4, 233.83us average wait
152, SYS , WAIT, db file scattered read , 812924, 80.18ms, 8.0%, [W ], 1077, 106.23, 754.8us average wait
152, SYS , WAIT, direct path write temp , 1758, 173.4us, .0%, [ ], 20, 1.97, 87.9us average wait
-- End of Stats snap 1, end=2013-02-18 23:46:25, seconds=10.1
-- End of ASH snap 1, end=, seconds=, samples_taken=0
And that’s it. Note that there’s no ASH output as if snapper returns control back to you between before & after snapshots, it can not sample the V$SESSION view for printing you the “ASH” top.
With the exception of ASH output and ignoring the snapshot_time and snapshot_count parameter values, Snapper allows you to use the full range of syntax for picking which stats do display (gather,sinclude,winclude,tinclude options) and which sessions to measure (sid@inst, user=joe@2, program=sqlplus%@* and even subqueries). You may want to sample some other session(s) this way or you may want to take before and after snapshots of your own session where snapper runs, there are no restrictions, just remember that when taking snapshots of your own session, there’s some snapper’s own activity which shows up in the session’s metrics. If you want a clean snapshot like autotrace is doing, then use one session for running snapper and another session for running the actual test cases.
Note that I just wrote a blog entry about Snapper v4 RAC GV$ view syntax too.
Enjoy! (and remember that Snapper v4 is still beta and likely has some issues)
Snapper v4.02 and the Snapper launch party video
I have fixed most of the bugs that showed up during the Snapper launch party session and uploaded the new version (v4.02) of Snapper here:
I have also uploaded the launch party hacking session video to enkitec.tv:
I have not updated the snapper documentation yet, but here are the main improvements:
- RAC support (query remote instance’s sessions performance data via GV$)
- Manually taken “before” and “after” snapshot support. No need to use DBMS_LOCK to sleep for a certain number of seconds between snapshots
- Useful averages in the right side of the output (e.g. average bytes sent/received per roundtrip etc). I will keep adding new averages over the coming weeks.
I have kept the syntax backwards-compatible, so when you run Snapper like this …
@snapper ash 5 1 user=tanel
… it will still query only local instance’s sessions.
But if you want to measure all tanel’s sessions in let’s say instance #2 only, you can just add @2 to the session filter condition:
@snapper ash 5 1 user=tanel@2
Or if you want all instances then use @*:
@snapper ash 5 1 user=tanel@*
If you want to measure a single session with sid 1234 in instance #2, you can do it in two ways:
@snapper ash 5 1 1234@2
Or use the inst_id,sid tuple syntax:
@snapper ash 5 1 (2,1234)
Note that the brackets are important, as otherwise snapper thinks that you want to measure SID 2 and SID 1234 in the local instance. You can pass multiple inst_id,SID tuples to the snapper though:
@snapper ash 5 1 (2,1234),(4,5678),(3,999)
Snapper still supports the qc= syntax too, to list all sessions started by a PX query coordinator session:
@snapper ash 5 1 qc=1234@*
Note that the qc= syntax is a bit broken/inconsistent right now – what the qc=1234@* means right now is that Snapper looks up the SID 1234 from every instance and lists slave sessions from all instances. If you use qc=1234@2 for example, then Snapper would only look for QC in the instance #2 and any slaves that are also in instance #2 (but with inter-instance parallelism you may have slaves in other instances too). I will fix it in a near-future version.
I will write about the other major features in separate blog entries, but if you want to learn about it now, then grab a beer (or coffe) and watch the latest enkitec.tv show (it’s a bit over 2 hours :-))
P.S. Snapper 4 is still beta! Some queries (like the qc=) one may take a while to run, running dynamic queries with subqueries against GV$ views efficiently is tricky business. This is why I have kept the old snapper v3 (which is more stable and pretty efficient) in the standard (old) snapper address.
Update: If you want to see what Snapper is capable of (it’s about the old version 3), check this document about using Snapper for Oracle performance measurement.
Drilling Deep Into Exadata Performance with ASH, SQL Monitoring and Exadata Snapper – slides and a hacking session!
The promised hacking session about Exadata performance troubleshooting will happen on Thursday 21st February 9am-11am PST (2 hours). I will show a number of examples and demos where the Exadata Snapper shows what was going on in storage cells when running Smart Scans or just doing IO.
Register here (free stuff!):
Slides:
I will post the scripts on the day of this session.
Note that you should read through the slides before the hacking session – as I’m not going to spend much time on presenting slides, as the name says – it’s a hacking session, so 95% demos. I will show a couple of important slides which explain the essence of Exadata Snapper (check slide 17, 21, 22), but otherwise it’s all demos.
This hacking session will also give a little preview of what’s going to happen in my new Advanced Exadata Performance seminars – see below ;-) As far as I know, there is no other seminar in the world (well at least available to public) where you can learn so deep knowledge about Exadata performance troubleshooting, optimization and internals – all arranged around a systematic approach for dealing with Exadata performance issues.
New Seminar – Advanced Exadata Performance: Troubleshooting and OptimizationThis advanced seminar is meant for DBAs and developers who already have at least the basic experience of working with Exadata and have seen some of the common performance problems that may occur in production or performance tests. In this class we will dive deep into details of how Exadata works internally and then how to systematically troubleshoot performance problems caused either by application, database, storage cell software or Exadata hardware issues. The demos and troubleshooting examples are shown on Enkitec’s in-house Exadata X2-2 and V2 machines and we will use both standard Oracle tools and advanced Exadata troubleshooting scripts written by Tanel.
What students will learn:
- Understand how Exadata features, the storage cell servers and the DB-cell communication work internally
- How to use database, OS and cell metrics for systematically troubleshooting Exadata performance issues, from basic to complex problems involving multiple components, bugs and hardware issues
- How to identify which workloads and SQL statements are not Exadata-friendly and what to do about it
- How to design for optimal Exadata performance and avoid design and coding pitfalls
What this class is not:
- This class is not an introductory “what is Exadata” class, but an advanced seminar about getting the most out of your Exadatas and resolving even complex performance problems systematically and fast
- This class is not an introductory SQL or database tuning class, but focuses on troubleshooting only Exadata-related performance issues of your workloads
- This class is not a general Exadata infrastructure troubleshooting class (like hardware, networking and Grid Infrastructure configuration problems), instead we focus entirely on performance
Day 1 – Exadata Internals, Systematic Troubleshooting and Smart Scans
- Understanding the low-level internals of Exadata data flow all the way from cell disks to the database nodes – and the physics behind its performance
- Using basic performance tools like V$SQLSTATS, ASH and SQL Monitoring views on Exadata
- Troubleshooting why Smart Scans (and direct path reads) do not get used
- Measuring the efficiency of Smart Scans and how much work gets offloaded to the storage cells
- Troubleshooting advanced Smart Scan performance problems, like passthrough mode and selective fallback to block IO – with Exadata Snapper
- Using CELLCLI and cell-level metrics where database & session metrics aren’t enough
- PX issues, including the In-Memory Parallel Execution feature
- Optimizing hash joins for efficient bloom filtering in the cells
- Troubleshooting Exadata Hybrid Columnar Compression issues
- Troubleshooting long commit and disk write latency issues on Exadata
- Troubleshooting flash caching issues and optimizing for tiered caching
- Learn to use the Exadata scripts, tools and OS commands Tanel regularly uses for Exadata performance troubleshooting
- Q&A and free-form demo session and/or reviewing Exadata performance case studies
If you like this – check out my Oracle training and seminars page, the first sessions happen already in May! (And there’s even one on-site in Enkitec office in Irving, TX).
Troubleshooting high CPU usage with poor-man’s stack profiler – in a one-liner!
Here’s an example of a quick’n'dirty way of profiling stack traces on your command line. This is an example from Solaris (but the script should work on Linux too plus other Unixes with minor modifications).
I created a problem case below, based on a case I once troubleshooted at a client site. Note that they had set optimizer_mode = FIRST_ROWS in their database and the optimized came up with a very inefficient execution plan for the select from DBA_LOCK_INTERNAL view below:
SQL> ALTER SESSION SET optimizer_mode = FIRST_ROWS; Session altered. SQL> SET TIMING ON SQL> SELECT * FROM dba_lock_internal; ... ... the output data stripped ... ... 927 rows selected. Elapsed: 00:23:27.14
It took over 23 minutes to return 927 rows from DBA_LOCK_INTERNAL!
I ran Snapper to see where the time is spent then:
SQL> @snapper4 all 5 1 222
Sampling SID 222 with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.00 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
222, SYS , STAT, non-idle wait count , 1, .19, , , , , ~ per execution
222, SYS , TIME, DB CPU , 6028084, 1.17s, 117.2%, [@@@@@@@@@@], , ,
222, SYS , TIME, sql execute elapsed time , 6032677, 1.17s, 117.3%, [##########], , ,
222, SYS , TIME, DB time , 6032677, 1.17s, 117.3%, [##########], , , ~ unaccounted time
222, SYS , WAIT, library cache: mutex X , 6, 1.17us, .0%, [ ], 1, .19, 6us average wait
-- End of Stats snap 1, end=2013-02-14 21:30:45, seconds=5.1
---------------------------------------------------------------------------------------------
Active% | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
---------------------------------------------------------------------------------------------
100% | c884zcqpv9y5h | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2013-02-14 21:30:45, seconds=5, samples_taken=50
The query is apparently 100% on CPU with no significant waits. Normally I would expect some other metrics to pop up in snapper output in such high-CPU cases, like thousands of session logical reads per second, sorts (rows) showing millions of rows sorted per second or parse count (hard) number in hundreds or thousands per second. These are the “usual suspects”.
But this time none of these additional metrics were incremented by the session. So it’s time to systematically drill down by other means.
Now had it been an Oracle 11g+ database (with diagnostics&tuning pack licenses) then I could have just queried ASH PLAN_LINE columns or ran a SQL Monitoring report to see in which execution plan line most of the time was spent (scroll to the bottom):
SQL> @xp 222 eXplain with Profile: Running DBMS_SQLTUNE.REPORT_SQL_MONITOR for SID 222... SQL Monitoring Report SQL Text ------------------------------ SELECT * FROM dba_lock_internal Global Information ------------------------------ Status : EXECUTING Instance ID : 1 Session : SYS (222:569) SQL ID : c884zcqpv9y5h SQL Execution ID : 16777216 Execution Started : 02/14/2013 21:30:16 First Refresh Time : 02/14/2013 21:30:22 Last Refresh Time : 02/14/2013 21:30:54 Duration : 38s Module/Action : sqlplus@mac02.local (TNS V1-V3)/- Service : lin112 Program : sqlplus@mac02.local (TNS V1-V3) Global Stats ============================================== | Elapsed | Cpu | Concurrency | Other | | Time(s) | Time(s) | Waits(s) | Waits(s) | ============================================== | 42 | 42 | 0.00 | 0.06 | ============================================== SQL Plan Monitoring Details (Plan Hash Value=1240353084) ============================================================================================================================================================ | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | | (%) | (# samples) | ============================================================================================================================================================ | 0 | SELECT STATEMENT | | | | 33 | +6 | 1 | 421 | | | | | 1 | VIEW | DBA_LOCK_INTERNAL | 4 | | 33 | +6 | 1 | 421 | | | | | 2 | UNION-ALL | | | | 33 | +6 | 1 | 421 | | | | | 3 | NESTED LOOPS | | 1 | | 1 | +6 | 1 | 25 | | | | | 4 | NESTED LOOPS | | 1 | | 1 | +6 | 1 | 25 | | | | | 5 | FIXED TABLE FULL | X$KSUSE | 1 | | 1 | +6 | 1 | 248 | | | | | 6 | VIEW | GV$_LOCK | 1 | | 1 | +6 | 248 | 25 | | | | | 7 | UNION-ALL | | | | 1 | +6 | 248 | 6461 | | | | | 8 | FILTER | | | | 1 | +6 | 248 | 6461 | | | | | 9 | VIEW | GV$_LOCK1 | 2 | | 1 | +6 | 248 | 6461 | | | | | 10 | UNION-ALL | | | | 1 | +6 | 248 | 6461 | | | | | 11 | FIXED TABLE FULL | X$KDNSSF | 1 | | | | 248 | | | | | | 12 | FIXED TABLE FULL | X$KSQEQ | 1 | | 1 | +6 | 248 | 6461 | | | | | 13 | FIXED TABLE FULL | X$KTADM | 1 | | 1 | +0 | 248 | 0 | | 2.50 | Cpu (1) | | 14 | FIXED TABLE FULL | X$KTATRFIL | 1 | | | | 248 | | | | | | 15 | FIXED TABLE FULL | X$KTATRFSL | 1 | | | | 248 | | | | | | 16 | FIXED TABLE FULL | X$KTATL | 1 | | | | 248 | | | | | | 17 | FIXED TABLE FULL | X$KTSTUSC | 1 | | | | 248 | | | | | | 18 | FIXED TABLE FULL | X$KTSTUSS | 1 | | | | 248 | | | | | | 19 | FIXED TABLE FULL | X$KTSTUSG | 1 | | | | 248 | | | | | | 20 | FIXED TABLE FULL | X$KTCXB | 1 | | | | 248 | | | | | | 21 | FIXED TABLE FULL | X$KSQRS | 1 | | 1 | +6 | 25 | 25 | | | | | 22 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 23 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 24 | MERGE JOIN CARTESIAN | | 5 | | | | 1 | | | | | | 25 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 26 | FIXED TABLE FULL | X$KSUPRLAT | 1 | | | | 1 | | | | | | 27 | FIXED TABLE FIXED INDEX | X$KSUPR (ind:1) | 1 | | | | | | | | | | 28 | BUFFER SORT | | 100 | | | | | | | | | | 29 | FIXED TABLE FULL | X$KSLWT | 100 | | | | | | | | | | 30 | FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) | 1 | | | | | | | | | | 31 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | | | | | | | | | 32 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 33 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 34 | MERGE JOIN CARTESIAN | | 1 | | | | 1 | | | | | | 35 | FIXED TABLE FULL | X$KSUPR | 1 | | | | 1 | | | | | | 36 | BUFFER SORT | | 100 | | | | | | | | | | 37 | FIXED TABLE FULL | X$KSLWT | 100 | | | | | | | | | | 38 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | | | | | | | | | 39 | FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) | 1 | | | | | | | | | | -> 40 | NESTED LOOPS | | 1 | | 37 | +6 | 1 | 396 | | | | | -> 41 | MERGE JOIN CARTESIAN | | 1 | | 37 | +6 | 1 | 17406 | | | | | 42 | NESTED LOOPS | | 1 | | 35 | +6 | 1 | 15 | | | | | 43 | NESTED LOOPS | | 1 | | 35 | +6 | 1 | 15 | | | | | -> 44 | FIXED TABLE FULL | X$KSUSE | 1 | | 37 | +6 | 1 | 15 | | | | | -> 45 | FIXED TABLE FIXED INDEX | X$KSLWT (ind:1) | 1 | | 37 | +6 | 15 | 15 | | | | | -> 46 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | 37 | +6 | 15 | 15 | | | | | -> 47 | BUFFER SORT | | 100 | | 37 | +6 | 15 | 17406 | 243K | | | | 48 | FIXED TABLE FULL | X$KGLOB | 100 | | 1 | +6 | 1 | 1208 | | | | | -> 49 | VIEW | DBA_KGLLOCK | 2 | | 41 | +2 | 17407 | 396 | | 2.50 | Cpu (1) | | -> 50 | UNION-ALL | | | | 37 | +6 | 17407 | 15M | | 2.50 | Cpu (1) | | -> 51 | FIXED TABLE FULL | X$KGLLK | 100 | | 42 | +1 | 17407 | 15M | | 37.50 | Cpu (15) | | -> 52 | FIXED TABLE FULL | X$KGLPN | 100 | | 40 | +3 | 17407 | 116K | | 55.00 | Cpu (22) | ============================================================================================================================================================
From the “Activity” columns in the right end of the above output you can see that most of the time 55.00 + 37.50% of this monitored query runtime was spent in execution plan lines 51 and 52 executing the FIXED TABLE FULL rowsource operations on X$KGLLK and X$KGLPN tables (library cache locks and library cache pins). Well, querying an X$ shouldn’t usually take tens of minutes! So, that’s why you need to also look into the “Execs” column (it’s called “Starts” in extended DBMS_XPLAN output) and we see that Oracle hasn’t accessed these X$ tables once, but already 17407 times – all done within the current execution of the SQL. So this is clearly an execution plan inefficiency in this case – the parent NESTED LOOPS operation (line 40) has just called that execution plan branch (#49-#52) under it that many times – as its first child “branch” (#41, which happens to be a cartesian join) has returned 17406 rows to it. That’s how nested loops work – unfortunately this time we ended up full scanning through the memory structures behind these X$ tables during every nested loop iteration. And that takes a lot of CPU (not to mention latching or mutexes, depending on Oracle version).
So we need to adjust the execution plan (by whatever means, like by gathering stats on the X$ using dbms_stats.gather_fixed_objects_stats or not allowing optimizer_mode = first_rows for this query), so that it would not iterate through these X$s so many times in the plan (and that MERGE JOIN CARTESIAN isn’t so good idea either, although there may be exceptions).
But anyway, we could come to this conclusion because in Oracle 11g, ASH provides us row-source level detail in its samples. But my original problem happened on Oracle 10.1 or 9.2 (don’t remember exactly). I didn’t want to just kill the query and run it with statistics_level = all, I wanted to troubleshoot it right away. So, I used pstack and my os_explain script for that which I have blogged about in past. But this time, as the session wasn’t completely hung, I wanted to take multiple stack samples of the Oracle process and then aggregate the stack samples to create sort of a profile of top functions taking time. And I didn’t want to report just a single top function, I wanted to know who called that function (parent) and who were the grandparents etc.
It’s actually very easy to get a quick semi-hierarchical aggregated stack profile with just an one-liner :) See below – I’m running 100 pstack commands on the Oracle (10.2) process in a row and then aggregating the output based on the indentation of the function name in os_explain output (the italic text is added as comments by me):
solaris01$ for i in {1..100} ; do pstack 6430 | os_explain -a ; done | sort -r | uniq -c
100 ????????
100 main <<-- 100 samples out of 100 taken (100%) had main() as a grandparent function
100 opimai_real
100 sou2o
100 opidrv
100 opiodr
100 opiino
100 opitsk
100 ttcpip
100 opiodr
100 opifch
100 SELECT FETCH:
100 VIEW: Fetch
100 * NESTED LOOP JOIN: Fetch
100 UNION-ALL: Fetch
100 VIEW: Fetch
100 * FIXED TABLE: Fetch
100 UNION-ALL: Fetch <<-- 100% of execution during sampling was under this UNION-ALL in the plan
86 kqlftp <<-- under the parent UNION-ALL, 86% was spent in kqlftp function (kqlf functions are used for reading X$ tables)
9 kqlftl
3 UNION-ALL: RowProcedure
1 FIXED TABLE: GCol
86 kglic <<-- one level deeper 86% of total time was spent in kglic function, which is used for library cache scanning
8 kglLockIterator
1 kglFreeFreeList
1 VIEW: RowProcedure
1 memcpy
1 ksfflt
1 kqlfnn
3 kqlfgl
85 kglic0
1 kghfrf
1 kghalf
1 expeal
7 ksfglt
23 ksfflt
3 kqlfnn
5 kgligp
1 evaiand
18 yield
3 sskgslgf
4 kslgetl
4 kslfre
When reading such stack profiles, you don’t have to decode what every single function name means all the way to the deepest levels. The point of stack profiling is to point out in which “branch” of Oracle function call tree most of the time was spent – and in our case we can already say that 86% of all the CPU time was spent in the kglic() function, which was called by an UNION-ALL row source, which was a few layers down from a NESTED LOOPS rowsource. So we know – by measuring – where Oracle spends all this CPU time. We can focus on figuring out what the kglic function means instead of having to figure out what all of these functions mean.
So while the stack profile doesn’t give us the info how many times a function was called, it does tell us roughly how much time was spent inside (or under) this function in the call tree and that takes us a big step closer to the root cause of this problem.
The real reason I’m writing this blog is that I want to show that basic stack profiling (or tracefile or heap dump profiling) is not too complex. When troubleshooting an urgent issue, you don’t have time for installing and trying out complex development tools and GUI profilers – processing the tracefile with a simple shell, awk or perl/python script can give you the quick’n'dirty answer. I came up with this ad-hoc profiling when troubleshooting a customer problem – it took just a minute to get the answer I wanted. Simplicity wins! Well at least it won in this case.
Note that on Linux you need to set an extra environment variable to get the sort command to behave (the output is slightly different than above as it’s taken from Oracle 11.2). Note that on Linux the pstack command is just a shell script wrapper around gdb, thus is pretty slow. You probably don’t want to examine processes just for fun on Linux production servers, but once some process starts behaving badly (and people are ready to kill it anyway) it is a decent approach for gathering evidence before pulling the trigger.
[root@oel6 tmp]# export LC_ALL=C ; for i in {1..100} ; do pstack 17163 | ./os_explain -a ; done | sort -r | uniq -c
100 main
100 ssthrdmain
100 opimai_real
100 sou2o
100 opidrv
100 opiodr
100 opiino
100 opitsk
100 ttcpip
100 opiodr
100 opifch
100 SELECT FETCH:
100 VIEW: Fetch
100 * NESTED LOOP JOIN: Fetch
100 UNION-ALL: Fetch
100 NESTED LOOP JOIN: Fetch
100 SORT: Fetch
100 NESTED LOOP JOIN: RowProc
100 VIEW: Fetch
100 * FIXED TABLE: Fetch
100 UNION-ALL: Fetch
45 kqlftp
53 kqlftl
1 UNION-ALL: RowProcedure
98 kglic0
1 VIEW: RowProcedure
17 kglic_cbk
10 kglReleaseHandleReference
10 kglReleaseBucketMutex
1 kglGetSessionUOL
29 kglGetHandleReference
4 kglGetBucketMutex
1 expeal
1 kqlpgCallback
5 kgligp
7 kgligl
13 kglReleaseMutex
3 kglGetSessionUOL
16 kglGetMutex
1 evaiand
2 kqlfgl
8 kgxRelease
8 kgxExclusive
3 kglMutexNotHeld
1 kglMutexHeld
1 kglIsMutexHeld
1 expepr
1 ??
1 P2D3
So, basic stack profiling can be done just with a few shell commands chained together. It’s not fully hierarchical stack profiling like ProcMon on Windows or the Sample Process in Activity Monitor of Mac OSX, but again its just one line of code :-)
Session Snapper v4 – The World’s Most Advanced Oracle Troubleshooting Script! – Webinar
Catchy title, huh? :-)
The Snapper v4 is getting ready! I am going to launch it on Wednesday 13 Feb 11am PST at the Snapper V4 webinar (register here).
The major new features include:
- RAC support – ability to query stats from remote instances
- Manual Before and After snapshot support – no need to use DBMS_LOCK sleeps anymore
- Show useful averages and ratios *in addition to* raw metrics for faster troubleshooting
- And more! :)
Snapper is still a free-to-use tool and it still does NOT require any object creation nor changes in your databases for use. Now even DBMS_LOCK access isn’t needed, although it’s still useful for convenience.
This is an 1 hour hacking session which may well end up taking 2 hours :) I will post the recording to http://enkitec.tv later on.
So, see you on Wednesday at the webinar!
Oh, wait, there’s more! You’re going to love this, especially if you love reverse engineering code puzzles! ;-)
New Kind of Snapper LogoI have also done major updates to Snapper logo which you should check out. If you have ever looked into the Snapper source code you know the old logo looks like this:
-- +-----=====O=== Welcome to The Session Snapper! (Yes, you are looking at a cheap ASCII -- / imitation of a fish and a fishing rod. -- | Nevertheless the PL/SQL code below the -- | fish itself should be helpful for quick -- | catching of relevant Oracle performance -- | information. -- | So I wish you happy... um... snapping? -- | ) -- | ...... -- | iittii,,.... -- ¿ iiffffjjjjtttt,, -- ..;;ttffLLLLffLLLLLLffjjtt;;.. -- ..ttLLGGGGGGLLffLLLLLLLLLLLLLLffjjii,, ..ii,, -- ffGGffLLLLLLjjttjjjjjjjjffLLLLLLLLLLjjii.. ..iijj;;.... -- ffGGLLiittjjttttttiittttttttttffLLLLLLGGffii.. ;;LLLLii;;;;.. -- ffEEGGffiittiittttttttttiiiiiiiittjjjjffLLGGLLii.. iiLLLLLLttiiii,, -- ;;ffDDLLiiiitt,,ttttttttttttiiiiiiiijjjjjjffLLLLffttiiiiffLLGGLLjjtttt;;.. -- ..ttttjjiitt,,iiiiiittttttttjjjjttttttttjjjjttttjjttttjjjjffLLDDGGLLttii.. -- iittiitttt, ;;iittttttttjjjjjjjjjjttjjjjjjffffffjjjjjjjjjjLLDDGGLLtt;;.. -- jjjjttttii:. ..iiiiffLLGGLLLLLLLLffffffLLLLLLLLLLLLLLLLffffffLLLLLLfftt,, -- iittttii,,;;,,ttiiiiLLLLffffffjjffffLLLLLLLLffLLffjjttttttttttjjjjffjjii.. -- ,,iiiiiiiiiittttttiiiiiiiiiijjffffLLLLLLLLffLLffttttttii;;;;iiiitttttttt;;.. -- ..iittttttffffttttiiiiiiiiiittttffjjjjffffffffttiittii:: ....,,;;iittii;; -- ..;;iittttttttttttttttiiiiiittttttttttjjjjjjtttttt;; ..;;ii;;.. -- ..;;;;iittttttjjttiittttttttttttttjjttttttttii.. .... -- ....;;;;ttjjttttiiiiii;;;;;;iittttiiii.. -- ..;;ttttii;;.... ..;;;;.... -- ..iiii;;.. -- ..;;,, -- .... --
As the Snapper script allows you to easily take performance snapshots of Oracle sessions and fish for solutions to your performance problems from the ocean of possible root causes, it makes sense to use a fish in the logo somewhere. I have also added sea and a cruise boat (symbolizing longer vacations thanks to Snapper!) into the new Snapper script file and and have pasted it below.
Scroll down and check it out – I’m sure you see it’s special. Does it ring a bell – what should you do with this special kind of a logo now? Of course – copy&paste it into sqlplus and run the logo! ;-) Note that you should run this on Oracle 11.2 or later.
There are actually two different logos below. The first one (cruise boat) is a low-tech one, suitable for use with Windows CMD.EXE and GUI tools like SQL*Developer.
Low-tech version:
SET LiNeSiZe 10000 PageSize 5000
SET TrimOut ON Head Off
/**/
SELECT
LISTAGG
(SUBSTR(s,
MOD(r2-1,115)+1,1)) WITHIN GROUP(ORDER BY r)
FROM(SELECT rownum r,RPAD(RPAD(RPAD(RPAD(LPAD(LPAD
('(',x,'('),20,' '),x+20,')'),40,' '),40+y,' '),50,'W')s FROM(
SELECT CEIL(ABS(SIN(rownum/30)*13))x,CEIL(ABS(COS(rownum/9)*5))y
FROM dual CONNECT BY LEVEL<=115)), (SELECT rownum r2 FROM (dual)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CONNECT BY LEVEL <= 50) GROUP BY MOD(r2-1, 115) +1;
Hi-tech version:
This version requires a modern xterm emulator like Putty (on Windows) or iTerm2 or Terminator on Mac OSX, if your current terminal ends up showing garbage. Go ahead, make your terminal window wide enough, paste and enjoy :-)
First change the sqlplus settings if you didn’t do so already:
SET LINESIZE 10000 TRIMOUT ON TRIMSPOOL OFF HEAD OFF PAGESIZE 5000
And now paste this in:
SELECT
CHR(POWER(3,3))
||'[38;5;0m'||LISTAGG(SUBSTR(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(s ,')',
CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL((SIN(r2/5) +
SIN(ROWNUM/150)+1)*3-1)+CEIL((-SIN(ROWNUM/150)+1)*3-1)*6+CEIL((COS(ROWNUM/150)+1)*3-1) *6*6),
'099'))||'m)'),'(',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL((SIN(r2/5)+SIN(ROWNUM/150) +1)*3-1)+
CEIL((-SIN(ROWNUM /150)+1)*3-1)*6+CEIL((COS(ROWNUM/150)+1)*3-1)*6*6),'099'))||'m('),' ',CHR(POWER(3,3))||'[48;5;'||TRIM (TO_CHAR(
(POWER(2,4)+CEIL(4)+CEIL(4)*6+CEIL(5-(r2/8))*6*6),'099'))||'m '),'.',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL(5)+CEIL(4)*6+CEIL
(3)*6*6),'099'))||'m~'),'W',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL(4+ABS(SIN(r/25)))+CEIL(2-ABS(SIN (r/5))*2)
*6+CEIL(2-ABS(SIN(r/5))*2)*6*6),'099'))||'m '),(MOD(r2-1,115)+1)*12+1,12))WITHIN GROUP(ORDER BY r)||CHR(POWER (3,3))
||'[0m' v FROM(SELECT ROWNUM r, RPAD(RPAD(RPAD(RPAD(LPAD(LPAD('(',x,'('),20,' '), x+20,')'),40,' '), 40+y,
'.'),50,'W')s FROM (SELECT CEIL(ABS(SIN(ROWNUM/30)*13))x,CEIL(ABS(COS(ROWNUM *
1/DBMS_RANDOM.VALUE(7,7.5))*7))y FROM dual CONNECT BY LEVEL<=115)),
(SELECT ROWNUM r2 FROM dual CONNECT BY LEVEL<=50)
GROUP BY MOD(r2-1,115)+1;
So, how do you like the new Snapper logo? :)
Feel free to leave a comment – and if you like this, please share it – all the billions of people of the world deserve to know how cool sqlplus is! :-)
Also, check out my upcoming Advanced Oracle Troubleshooting and Advanced Oracle SQL Tuning seminars and see you all on Wednesday at Snapper V4 Launch Party ;-)
Enkitec Extreme Exadata Expo (E4) 2013 – Call for Papers
Enkitec E4, the only exclusively Exadata-focused conference in the world, is going to be back this this year too! :-)
It will take place on 5-6 August 2013 in Four Seasons Hotel & Resorts in Irving, TX. It will be very hot in Texas then, but the hotel has a really cool pool there (which I intend to be able to use this year – last year I had to spend all my free time working in the hotelroom unfortunately). And they have beer inside too :)
There will be very good speakers showing up this year too (including the keynote speaker who’s awesome, but I can’t say the name yet) and I’ll hang out there too. I think I’ll deliver a live demo / internals hacking session this year (that way I don’t have to spend my night before the conference preparing slides and can focus more on the cool stuff).
So check out the Enkitec E4 page, it has a video from last year’s event and also a link for submitting your abstracts for the conference!
See you there! (in person or virtually :)









