Re: shared pool latching issue and missing time in trace file

From: David Fitzjarrell <oratune_at_yahoo.com>
Date: Tue, 11 Jun 2013 06:44:13 -0700 (PDT)
Message-ID: <1370958253.38644.YahooMailNeo_at_web121601.mail.ne1.yahoo.com>



It will help tremendously to know which release of Oracle you are using; shared pool issues can (and do) vary by release.  
I will ask if you have used Tanel Poder's sgastatx.sql (http://blog.tanelpoder.com/files/scripts/sgastatx.sql) to report on all of the various memory areas in the shared sub-pools.  It is an excellent tool I have used numerous times to report memory allocation and diagnose issues with the shared pool when ORA-04031 errors crop up.  You can see which sub-pools are being depleted the fastest and, possibly, decide how to correct the problem.  
As to where your time is spent when running this query as a non-SYS user, have you generated tkprof output from them to see if the execution plans are the same?  I would expect them to differ somewhat and seeing those plans may give another clue to help you solve this time mystery. David Fitzjarrell  

 From: Sigrid Keydana <keydana_at_gmx.de> To: oracle-l_at_freelists.org
Sent: Tuesday, June 11, 2013 5:05 AM
Subject: shared pool latching issue and missing time in trace file   

Hi,

sorry for the misformatted mail this morning - I'm trying again.

I'd like to ask for your advice debugging a shared pool latching problem. For background, let me state that in general, in the application concerned, shared pool fragmentation has been a known issue, but less so in the present since quite some work has been done to reduce hard parsing. However, there is a statement (which I cannot reproduce here but might do so in a masked form if required) that regularly causes the alert (deemed "harmless" by Oracle Support, see 330239.1)

Memory Notification: Library Cache Object loaded into SGA Heap size 75750K exceeds notification threshold (51200K)

I have been tracing this statement in 3 different conditions:

- as SYS user on the database server
- as non-SYS user from sqlplus, on the database server
- as non-SYS user from a client machine, using PL/SQLDeveloper


  1. SYS on db server In this case, I see a small number of recursive statements in the tracefile, and parse cpu/elapsed time is cA9936,eB0700. No "memory notification" in the alert log. Query executes smoothly.
  2. non-SYS on db server, using sqlplus Here, the query does not execute smoothly. Before the entries for the recursive sqls, I have the following lines:

..
*** ACTION NAME:() 2013-06-10 15:31:32.544
WAIT #47026981749560: nam='latch: shared pool' ela= 65 address11705384 number07 tries=0 obj#=0 tim70871092544435

  • 2013-06-10 15:31:46.136 WAIT #47026981749560: nam='latch: row cache objects' ela= 119 address2940374584 number(0 tries=0 obj#=0 tim70871106136480
  • 2013-06-10 15:33:50.873 WAIT #47026981749560: nam='latch: shared pool' ela= 9308 address11705384 number07 tries=0 obj#=0 tim70871230873117 WAIT #47026981749560: nam='latch: shared pool' ela= 204 address11705384 number07 tries=0 obj#=0 tim70871230873406

Parse time is c&1363267,e&1410049 - so parsing took more than 4 minutes. What I don't understand is what happened during all this time - the 4 wait events reported look far "too short" to match the actual time elapsed (e.g., the longest shared pool latch wait accounting for 9 ms only…) (And just in parentheses, the recursive queries turn out to be very quick).

In spite of the long query duration, no memory notification occurred.

3) non-SYS from PL/SQLDeveloper
Lastly, apart from the fact that the PL/SQLDeveloper trace file is messed up by lots of (for me :-)) unexpected sql (at dep=0!), it resembles the second case for the "how was the time spent" part:

  • 2013-06-06 12:04:21.832 WAIT #47816534037856: nam='latch: shared pool' ela= 1012 address11705224 number07 tries=0 obj#=-1 tim70513061832831 WAIT #47816534037856: nam='latch: row cache objects' ela= 94 address3352528640 number(0 tries=0 obj#=-1 tim70513061833689
    • 2013-06-06 12:04:21.881 WAIT #47816534037856: nam='latch: row cache objects' ela= 170 address3352528640 number(0 tries=0 obj#=-1 tim70513061881325
    • 2013-06-06 12:04:57.512 WAIT #47816534037856: nam='latch: row cache objects' ela= 641 address2940374584 number(0 tries=0 obj#=-1 tim70513097512746
    • 2013-06-06 12:05:05.639 WAIT #47816534037856: nam='latch: shared pool' ela= 3 address11705224 number07 tries=0 obj#=-1 tim70513105639833
    • 2013-06-06 12:07:59.226 WAIT #47816534037856: nam='latch: row cache objects' ela= 76 address2824711104 number(0 tries=0 obj#=-1 tim70513279226650
    • 2013-06-06 12:08:13.053 WAIT #47816534037856: nam='latch: shared pool' ela= 369 address11705224 number07 tries=0 obj#=-1 tim70513293053337
    • 2013-06-06 12:08:22.800 WAIT #47816533789368: nam='library cache lock' ela= 265 handle address3747902984 lock address3507635152 100*mode+namespace32087638018 obj#=-1 tim70513302800299 WAIT #47816533789368: nam='library cache pin' ela= 180 handle address3747902984 pin address3507634896 100*mode+namespace32087638018 obj#=-1 tim70513302800606 WAIT #47816533853720: nam='library cache lock' ela= 458 handle address3475669128 lock address3527516264 100*mode+namespace4554379266 obj#=-1 tim70513302802113

In this scenario, however, contrary to the sqlplus execution, I can bet I will get the memory notification in the alert log.

Please excuse the long description - now these are my questions:

  1. In cases 2 and 3, where is the time spent?
  2. What might be the relation between the the long waits and the "memory notification" alert, given there seems to be no 1:1 relationship? What do you think, how should on judge/react on this alert?
  3. In general, would you have any advice how to proceed with this issue? Is it even worthwhile to follow up questions like "why this is parsed quickly when issued by SYS, but not otherwise", or to try to do more of "shared pool diagnostics / debugging" (given that in general, some level of fragmentation will be unavoidable due to how the application works)? So perhaps, concentrate on tuning this specific sql, rather?

I'd be very glad about any advice / opinions on this. Thanks in advance, Sigrid

--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jun 11 2013 - 15:44:13 CEST

Original text of this message