Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> library cache load lock - major problems
Morning all (or afternoon - I'm behind another firewall and using
Google !)
Oracle enterprise : 8.1.7.3 (there's nothing I can do about this I'm
afraid !)
HP-UX 11.11
8 CPUs and load average is 0.5 over 45 days.
Sessions hang on library cache load lock, all are executing the same sql statement which makes a call to UTL_TCP.flush followed by utl_tcp.close_connectioon (see below). This is intermittants but very irritating for my users.
I've Googled these groups and Metalinked till I'm blue in the face - nothing. :o(
Gruesome details.
User logs into an application - it sometimes hangs.
User calls me up to advise me of the hang, in case I don't already
know.
I log into application - it hangs for me too. (I'm on a different
network etc).
I use SQLPlus and get an instant connection - my session, and that of
my user(s) stay hung. Hence problem must be in the application.
My script to interrogate V$SESSION_WAIT & V$SESSION shows the following :
SQL> @sessionwait
SID USERNAME OSUSER SQL_HASH_VALUE EVENT
6 rows selected.
www2 is a web interface (written in Java) to the application via Apache.
The application also seems to login three sessions to the database for every actual execution of the application. Hmm.
Everyone is waiting on the same sql_hash_value. This is a simple statement :
SQL> @sqlbyhash_8i
Enter value for hash_value: 1359669396
SQL_TEXT
Now, I've tracked the above code to one place, in a logoff trigger for a specific schema that we are not (directly) logging into - it is possible that the application is logging into it in the background though. The comments helped in tracing the code !
The code being executed, FDI_NORM.TIDY, can and is executed from elsewhere though. I've never caught it hanging when called from elsewhere, it's always from the above statement which is in the trigger.
The actual code being executed from the trigger looks like this :
procedure Tidy is
begin
utl_tcp.flush(c);
utl_tcp.close_connection(c);
Connected := false;
SessionID := -1;
exception
when OTHERS then null;
end Tidy;
I have both FDI_NORM and UTL_TCP pinned - it makes no difference. (By the way, FDI_NORM has nothing to do with me, only the names are the same - this code is from a third party application.)
Now I've tested this procedure in isolation when the system is running fine - it takes 2.5 seconds on the first run and zero seconds on all subsequent runs during this session.
When the system is hanging, it takes around 4 minutes to run and times out. The connection is to another server here and that server is running happily as is the network - we've had it checked by out networking guys & gals.
Metalink note 122793.1 - How To Find The Session Holding A Library Cache Lock - has two scripts to list holders and waiters which select from SYS.X$KGLLK. When the system is hung, these scripts return no rows selected for holders and waiters. SessionWait still shows waiters on L.C.L.L.
DBA_LOCK_INTERNAL shows that each waiting session has the following :
MODE_HELD = Share and MODE_REQUESTED = None for one session, plus two entries with MODE_HELD = Null and MODE_REQUESTED = None. All have the same SESSION_ID but LOCK_ID1 is different and LOCK_ID2 is null for each. The lock_type is a 'Cursor definition lock' for the two Null mode_held entries and 'Cursor Definition pin' for the 'share' entry.
One of the waiters in V$SessionWait is a background task (no OSUSER in V$SESSION) and has the name 'SNP0', description 'Job Queue Process 0' in V$BGPROCESS.
Shared pool is 250 Mb. There is an extract of the statspack report at the end of this posting, but I have a few worries :
Response times (as analysed by oraperf.com) were 99.93% wait time and 0.07% cpu.
L.C.L.L was responsible for 98.69% of the total wait time.
63,432,885 gets on the Library Cache latches was requested in 75 seconds. Interesting. This was 99.19% of the latch free waits. Of these, 1,723 slept. The latch sleeps were mostly on kglic (690), kglpnal: child:before pro (308), kglpnal: child: alloc spac (211) and kglhdgn:child (156). The rest dwindled from 78 sleeps down to 1.
Session_cached_cursors is not set.
Anyone got any ideas on tracking this down ? Please & thanks in advance.
Oh, by the way, the application *does* use bind variables :o)
Cheers,
Norm.
Here's the SP report stuff :
A statspack report for 75 seconds, which is a 'just before' the lockup to 'just after', shows the following (extracts) :
Snap Id Snap Time Sessions ------- ------------------ -------- Begin Snap: 16114 29-Jun-05 14:40:25 5,303 End Snap: 16116 29-Jun-05 14:41:40 5,303 Elapsed: 1.25 (mins) Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 11,115.84 21,376.62 Logical reads: 1,961.79 3,772.67 Block changes: 18.31 35.21 Physical reads: 4.20 8.08 Physical writes: 3.76 7.23 User calls: 6.59 12.67 Parses: 17.05 32.79 Hard parses: 1.63 3.13 Sorts: 64.03 123.13 Logons: 0.16 0.31 Executes: 258.83 497.74 Transactions: 0.52 % Blocks changed per Read: 0.93 Recursive Call %: 99.19 Rollback per transaction %: 2.56 Rows per Sort: 6.89
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: 100.00 Redo NoWait %: 100.00 Buffer Hit %: 99.79 In-memory Sort %: 99.98 Library Hit %: 96.80 Soft Parse %: 90.46 Execute to Parse %: 93.41 Latch Hit %: 99.99 Parse CPU to Parse Elapsd %: 80.41 % Non-Parse CPU: 100.00 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 84.47 91.13% SQL with executions>1: 91.79 89.32 % Memory for SQL w/exec>1: 89.60 94.41
(Hey, I must have a great performing database, my buffer hit ratio is 99.79% !!!)
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs)Wt Time
-------------------------------------------- ------------ ------------ ------- library cache load lock 268 30,205 98.29 db file sequential read 92 175 .57 log file parallel write 75 129 .42 enqueue 1 81 .26 latch free 1,764 54
.18
Wait Events for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116
-> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms)/txn
---------------------------- ------------ ---------- ----------- ------ ------ library cache load lock 268 87 30,205 1127 6.9 db file sequential read 92 0 175 19 2.4 log file parallel write 75 0 129 17 1.9 enqueue 1 0 81 810 0.0 latch free 1,764 943 54 0 45.2 control file parallel write 24 0 44 18 0.6 library cache pin 18 0 16 9 0.5 log file sync 17 0 11 6 0.4 log buffer space 6 0 11 18
0.2
Instance Activity Stats for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116
Statistic Total per Second perTrans
--------------------------------- ---------------- ------------ ------------ CPU used by this session 754,871 10,065.0 19,355.7 CPU used when call started 2,017 26.951.7
parse count (hard) 122 1.6 3.1 parse count (total) 1,279 17.1 32.8 parse time cpu 119 1.6 3.1 parse time elapsed 148 2.03.8
Latch Activity for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get
requests
->"Pct Misses" for both should be very close to 0.0
Pct Avg Pct Get Get Slps NoWait NoWait Latch Name Requests Miss /Miss RequestsMiss
----------------------------- -------------- ------ ------ ------------ ------ JOX SGA heap latch 869 0.1 1.0 0 Token Manager 2 0.0 5 0.0 active checkpoint queue latch 30 0.0 5 0.0 cache buffer handles 1,942 0.0 0 cache buffers chains 254,304 0.1 0.1 136 0.0 cache buffers lru chain 47 0.0 123 0.0 channel handle pool latch 17 0.0 0 channel operations parent lat 22 0.0 0 checkpoint queue latch 444 0.0 0 dml lock allocation 226 0.0 0 enqueue hash chains 551 0.0 0 enqueues 892 0.0 0 event group latch 5 0.0 0 job_queue_processes parameter 1 0.0 0 latch wait list 1,434 1.3 0.1 1,425 0.4 library cache 63,432,885 0.0 0.2 431 3.9 library cache load lock 1,589 2.3 0.0 0 list of block allocation 129 0.0 0 loader state object freelist 6 0.0 0 longop free list 2 0.0 0 messages 464 0.0 0 mostly latch-free SCN 139 0.0 0 ncodef allocation latch 1 0.0 0 process allocation 5 0.0 5 0.0 process group creation 17 0.0 0 redo allocation 882 0.2 0.0 0 redo writing 357 0.0 0 row cache objects 17,425 0.6 0.0 8 0.0 sequence cache 57 0.0 0 session allocation 5,417 0.0 0.0 0 session idle bit 1,140 0.0 0 session switching 1 0.0 0 shared java pool 47 0.0 0 shared pool 21,489 0.2 0.2 0 sort extent pool 12 0.0 0 transaction allocation 194 0.0 0 transaction branch allocation 1 0.0 0 undo global data 339 0.0 0 user lock 222 0.0 0 -------------------------------------------------------------Latch Sleep breakdown for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116
Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 -------------------------- -------------- ----------- ----------- ------------ library cache 63,432,885 9,048 1,7237666/1065/29
5/22/0
cache buffers chains 254,304 180 14
168/10/2/0/0
row cache objects 17,425 102 2 100/2/0/0/0 shared pool 21,489 49 11 44/2/2/1/0 latch wait list 1,434 19 1 18/1/0/0/0 JOX SGA heap latch 869 1 1 0/1/0/0/0 -------------------------------------------------------------Latch Miss Sources for DB: OLIBPR Instance: olibpr Snaps: 16114
-16116 -> only latches with sleeps are shown -> ordered by name, sleeps desc NoWait Waiter Latch Name Where Misses SleepsSleeps
------------------------ -------------------------- ------- ---------- ------- cache buffers chains kcbgtcr: kslbegin 0 13 9 cache buffers chains kcbrls: kslbegin 0 1 5 latch wait list No latch 0 1 1 latch wait list kslfre 2 1 1 library cache kglic 0 690 1,551 library cache kglpnal: child: before pro 0 308 14 library cache kglpnal: child: alloc spac 0 211 18 library cache kglhdgn: child: 0 156 23 library cache kglobpn: child: 0 78 19 library cache kglhdgh: child: 0 32 24 library cache kgldti: 2child 0 31 0 library cache kglhdgc: child: 0 31 5 library cache kgllkdl: child: cleanup 0 26 12 library cache kglrtl 0 23 0 library cache kglupc: child 0 16 12 library cache kgldtld: 2child 0 15 0 library cache kglati 0 13 7 library cache kglpin 0 13 3 library cache kglget: child: KGLDSBRD 0 7 0 library cache kgllkdl: child: free pin 0 5 10 library cache kglget: child: KGLDSBYD 0 4 1 library cache kglpnc: child 0 4 1 library cache kglpnal: child: check gran 0 3 1 library cache kgldtld: parent 0 1 0 library cache kglini: parent 0 1 0 library cache kglpnp: child 0 1 2 row cache objects kqrpre: find obj 0 2 0 shared pool kghfrunp: clatch: nowait 0 32 0 shared pool kghfrunp: clatch: wait 0 19 3 shared pool kghfrunp: alloc: clatch no 0 5 0 shared pool kghfrunp: alloc: wait 0 4 0 shared pool kghalp 0 1 0 shared pool kghupr1 0 1 2 -------------------------------------------------------------Dictionary Cache Stats for DB: OLIBPR Instance: olibpr Snaps: 16114
-16116 ->"Pct Misses" should be very low (< 2% in most cases) ->"Cache Usage" is the number of cache entries being used ->"Pct SGA" is the ratio of usage to allocated size for that cache Get Pct Scan Pct Mod Final Pct Cache Requests Miss Requests Miss ReqUsage SGA
---------------------- ------------ ------ -------- ----- -------- ------ ---- dc_constraints 0 0 0 0 0 dc_database_links 0 0 0 0 0 dc_files 0 0 0 10 91 dc_free_extents 0 0 0 11 92 dc_global_oids 0 0 0 5 83 dc_histogram_data 0 0 0 0 0 dc_histogram_data_valu 0 0 0 0 0 dc_histogram_defs 1,614 17.0 0 0 507 100 dc_object_ids 1,090 6.2 0 0 183 96 dc_objects 860 15.3 0 0 310 99 dc_outlines 0 0 0 0 0 dc_profiles 22 0.0 0 0 2 40 dc_rollback_segments 0 0 0 17 94 dc_segments 414 31.6 0 0 123 97 dc_sequence_grants 0 0 0 1 50 dc_sequences 19 0.0 0 14 8 67 dc_synonyms 88 25.0 0 0 52 93 dc_tablespace_quotas 0 0 0 1 50 dc_tablespaces 4 25.0 0 0 2 20 dc_used_extents 0 0 0 5 83 dc_user_grants 20 0.0 0 0 16 42 dc_usernames 572 0.0 0 0 12 52 dc_users 951 0.0 0 0 30 77 ifs_acl_cache_entries 0 0 0 0 0 -------------------------------------------------------------
Library Cache Activity for DB: OLIBPR Instance: olibpr Snaps: 16114
-16116
->"Pct Misses" should be very low
Get Pct Pin Pct Invali- Namespace Requests Miss Requests Miss Reloadsdations
BODY 81 1.2 83 8.4 5 0 CLUSTER 277 0.4 32 6.3 0 0 INDEX 0 0 0 0 OBJECT 0 0 0 0 PIPE 0 0 0 0 SQL AREA 1,500 10.7 22,161 0.8 72 8 TABLE/PROCEDURE 2,160 3.4 7,376 10.2 271 0 TRIGGER 12 0.0 12 0.0 0 0 -------------------------------------------------------------SGA Memory Summary for DB: OLIBPR Instance: olibpr Snaps: 16114 -16116
SGA regions Size in Bytes ------------------------------ ---------------- Database Buffers 104,857,600 Fixed Size 104,936 Redo Buffers 77,824 Variable Size 102,891,520 ---------------- sum 207,931,880
<END OF EXTRACT> Received on Wed Jun 29 2005 - 10:12:55 CDT