Re: Sudden occurrence of SHARED_POOL LATCH waits - DB up since 4/3/2018

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Thu, 31 May 2018 12:17:06 -0600
Message-ID: <34a8fd79-ad87-446a-d56b-009da168e695_at_gmail.com>



Chris,

Is auto SGA management enabled?  That is, is either SGA_TARGET > 0 or MEMORY_TARGET > 0? If so, please consider examining the log of auto SGA management in V$SGA_RESIZE_OPS view?

As described in my presentation
<
http://evdbt.com/download/presentation-rdbms-forensics-troubleshooting-using-ash/> and white paper
<http://evdbt.com/download/paper-rdbms-forensics-troubleshooting-using-ash/> on "RDBMS Forensics: Troubleshooting Using ASH", there is a situation in some applications, particularly analytic or data mart applications, where huge complex SQL or PL/SQL cursors cause rapid expansion of the Shared Pool during parse, and then equally rapid expansion of the Buffer Cache during subsequent execution and fetch.  Adjusting the size of the Shared Pool requires the Shared Pool latch, so thrashing space between the Buffer Cache and the Shared Pool is almost certain to result in that latch becoming a bottleneck, as described in the presentation.

Jonathan Lewis also discusses this phenomenon in a related posting HERE <https://jonathanlewis.wordpress.com/2007/04/16/sga-resizing/>.

Anyway, my solution for the thrashing was to determine a "floor" value for the parameter SHARED_POOL_SIZE based on the historic average size of "__shared_pool_size" as recorded in the AWR view DBA_HIST_PARAMETERS, and set SHARED_POOL_SIZE to that "floor" value, preventing the Shared Pool from being made smaller than that value. The basic idea is to permit auto SGA management to adjust occasionally, but not thrash.

Please let us know what you find, and I hope this helps?

Thanks!

-Tim

On 5/31/18 11:39, Dominic Brooks wrote:
> Am I out of touch or is that an exceedingly large shared pool?
>
> Sent from my iPhone
>
> On 31 May 2018, at 18:05, Chris Taylor
> <christopherdtaylor1994_at_gmail.com
> <mailto:christopherdtaylor1994_at_gmail.com>> wrote:
>
>> version 12.1
>> Linux x86-64
>>
>> Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
>> latch: shared pool 402,269 103.4K 256.95 29.5 Concurrency
>> DB CPU 100.3K 28.6
>> latch free 362,927 42.2K 116.26 12.0 Other
>>
>>
>>
>>
>>
>>
>>
>>
>> Background:
>> We've got a database with 4TB of RAM and a 260GB shared pool.
>> Everything was running normally until 2 nights ago when a regularly
>> scheduled job that does TONS of DDL suddenly blew up the shared pool.
>>
>> Trying to query x$ksmsp for shared pool sizes is also causing huge
>> amounts of latch waits.
>>
>> When these jobs aren't running, the db is working normally.  Starting
>> just one of the jobs brings back the shared pool latching problem
>> immediately.
>>
>> I'm thinking the shared pool is heavily fragmented but am looking for
>> additional queries or views I can use to diagnose shared pool
>> problems.  We're not running into any shared pool errors as far as
>> memory allocations.
>>
>> This query hangs and causes huge contention:
>>
>> select ksmchcls "ChnkClass",
>> sum(ksmchsiz) "SumChunkTypeMem",
>> Max(ksmchsiz) "LargstChkofThisTyp",
>> count(1) "NumOfChksThisType",
>> round((sum(ksmchsiz)/tot_sp_mem.totspmem),2)*100||'%' "PctTotSPMem"
>> from x$ksmsp,
>> (select sum(ksmchsiz) TotSPMem from x$ksmsp) tot_sp_mem
>> group by ksmchcls, tot_sp_mem.TotSPMem
>> order by 5;
>>
>>
>> This query runs fine and seems to indicate there is plenty of reserve
>> space:
>>
>> select free_space, round(avg_free_size,2) as
>> avg_free_size,free_count,max_free_size,used_space,
>> round(avg_used_size,2) as avg_used_size, used_count, max_used_size,
>> requests, request_misses, last_miss_size,
>> request_failures, last_failure_size, aborted_request_threshold,
>> aborted_requests
>> from v$shared_pool_reserved
>> /
>>
>> FREE_SPACE AVG_FREE_SIZE FREE_COUNT MAX_FREE_SIZE USED_SPACE
>> AVG_USED_SIZE USED_COUNT MAX_USED_SIZE REQUESTS REQUEST_MISSES
>> LAST_MISS_SIZE REQUEST_FAILURES LAST_FAILURE_SIZE
>> ABORTED_REQUEST_THRESHOLD ABORTED_REQUESTS
>> 13435398168 236538.7 16023 27316040 319665968 5627.92 40777
>> 1048552 10903867 0 0 0 0 2147483647 0
>>
>>
>>
>> The Jobs in question do PARTITION EXCHANGES for many, many sites with
>> temp tables that are built as part of the job.
>>
>> The DB has been up since: 04/03/2018 07:12:23 and the last 2 days are
>> the first time this has been seen and is repeatable by restarting any
>> one of the jobs (there are 8 total)
>>
>> Thinking about flushing the shared pool (or restarting db off hours)
>> as a test to see if it resolves the issue (like throwing a grenade
>> into a anthill resolves a problem with ants...not the best approach
>> but might be effective)
>>
>> Any thoughts/comments/insults?
>>
>> Thanks,
>> Chris
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 31 2018 - 20:17:06 CEST

Original text of this message