Re: Library cache latch contention

From: Ricky Sanchez <rsanchez_at_more.net>
Date: Sun, 02 Mar 2003 18:24:02 GMT
Message-ID: <3E624C36.F6C6E4F7@more.net>

Given that the minimum sleep time for a latch is 10 milliseconds, it would seem that millions of sleeps could not possibly result in zero latch wait time. This almost has to be a bug, although one I have not encountered myself. You really ought to install statspack and run it at level 10 for a while to check out the latch children stats. Installing Statspack is a no-brainer and only takes a few minutes, if you have a suitable "tools" or other non-production tablespace available.

Jonathan- you seem to be describing either a no-wait or a "fast get" scenario for library cache gets. Is that what you are referring to? There is the normal situation where a "sleep" begins with a zero sleep time, essentially emulating a no-wait get prior to racking up actual sleep time. Then the actual sleeps begin, and 9i still starts with 10 milliseconds, like prior releases.

If this is your reference, the only circumstances under which this situation could happen for millions of latch sleeps would be extremely deterministic. However, latch contention for such hash table algorithms tend to be random and show exponential distributions - so you would inevitably have considerable measured wait time.

I suppose theoreticaly a sleeper could be posted by a holder prior to the end of the minimum sleep time-- say for less than a millisecond-- and then only have to sleep the one time. But again, not for millions of latch gets. This would be so deterministic as to defy reason. Hash table latch contention really does appear to be random under load conditions, whether for the library cache, the buffer cache or the enqueue resources. As such, there would inevitably be some long-held latches or some spikes of contention and therefore long and multiple sleeps.

In any case, the report of more sleeps than get requests is indicative of either bad data or a bug, for sure. It's hard to imagine a data overflow resulting in this sort of report.

The bottom line, Dias, is that this latch is used for cursor lookup in the library cache hash table. It means that multiple sessions are trying to scan the same hash table location (hash chain, as we call it) concurrently, and they have to wait their turn. Eliminate the lookups and you eliminate the contention. You can either keep cursors open for repeated execution within a session, or perhaps eliminate hard parsing by using bind variables and so forth. You already mention the use of session_cached_cursors and cursor_sharing = force, which would normally be helpful in such situations. So, the initial impression, if the facts are presented accurately, is that you may be encountering a bug.

There are known problems with v$session_wait results, but I think those issues are only present in early 9.2. I could well be wrong, however, and you might contact support for help on this.

And use Statspack if at all possible.

ricky

Jonathan Lewis wrote:
>
> Your results do look odd - after all, you can only
> start accumulating sleeps after a miss, and with a
> minimum sleep of 1/100 of a second (which is what
> it used to be in v8 and below), you seem to be averaging
> more than three hours of wait time every time you
> miss a library cache latch (and that isn't allowing
> for the roughly exponential increase in the length
> of successive sleeps).
>
> I suspect an anomaly somewhere in the code where
> Oracle 9 has an option to yield immediately (without
> spinning) and use a much shorter sleep time. This type
> of strategy change in the low-level code is likely to cause
> all sorts of upsets and anomalies (or perhaps things that
> look like anomalies) to appear in the migration from 8.1
> through 9.0 to 9.2
>
> You could compare v$system_event with v$latch to
> see if the number of 'latch free' events seems reasonable
> compared to the number of sleeps in v$latch - and there
> are a couple of micro-second time columns in v$latch,
> v$latch_children, v$system_event, v$session_event
> which you could cross-check for 'reasonableness'. And
> there are a couple of 'max time' columns that you could
> check in the event views. This MIGHT give you a clue
> about whether the high sleep figures are genuine, or a bug
> in the v$latch(_children) statistics.
>
> --
> Regards
>
> Jonathan Lewis
> http://www.jlcomp.demon.co.uk
>
> Coming soon one-day tutorials:
> Cost Based Optimisation
> Trouble-shooting and Tuning
> Indexing Strategies
> (see http://www.jlcomp.demon.co.uk/tutorial.html )
>
> ____UK_______March 19th
> ____UK_______April 8th
> ____UK_______April 22nd
>
> ____USA_(FL)_May 2nd
>
> Next Seminar dates:
> (see http://www.jlcomp.demon.co.uk/seminar.html )
>
> ____USA_(CA, TX)_August
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>
> "dias" <ydias_at_hotmail.com> wrote in message
> news:55a68b47.0303011427.12f689bc_at_posting.google.com...
> > Hi,
> >
> > I observed the following stats on a 9.0.1 (os is tru64) database
> about
> > "libary cache latch". The application is an industrial one with only
> > small stored PL/SQL procedures.
> >
> > The first report:
> >
> > LATCH_NAME GETS MISSES HIT_RATIO SLEEPS
> > SLEEPS/MISS
> > ------------------ ----------- ----------- ----------- -----------
> > -----------
> > library cache 1021637 126 1 108618601
> > 862052.389
> >
> > After setting session_cached_cursors = 100, and cursor_sharing =
> FORCE
> > (the application does not use bind variables):
> >
> > LATCH_NAME GETS MISSES HIT_RATIO SLEEPS
> > SLEEPS/MISS
> > ------------------ ----------- ----------- ----------- -----------
> > -----------
> > library cache 988150 34 1 31887719
> > 937874.088
> >
> > There is no reloads. The v$sgastat reports 50 Mo of free memory, the
> > shared pool is about 100 Mo. The time waited for latch free is very
> > small (the average wait is 0).
> >
> > The question is, what causes a lot of sleeps over this latch ?
> >
> > Thanks
Received on Sun Mar 02 2003 - 12:24:02 CST