RE: Shared pool error even it shows enough free memory

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 3 Jun 2022 14:45:52 -0400
Message-ID: <0fc501d8777a$26fc3d10$74f4b730$_at_rsiz.com>



Two things:  

  1. The history of the resize operations and the maximum number of granules can give you a good idea whether or not a static allocation is possible and optimal for you. Without tracking your history for a decent while, see Tim Gorman’s comment about the genius of automation and dynamics, etc. The history *MAY* show that your resizes and the resulting sizes are too chaotic for a static allocation.
  2. IF, on the other hand, the maximums of the resize operations in your history are within reason to up them both statically to the maximum seen, and possibly short changing buffer pool a bit, because a few heavy drinkers may bloat that without much actual elapsed time benefit, then up them both statically. In practice I have seen resize operations shift one or two granules in a net destructive oscillation.

Again, if you don’t have history that supports the choice to go static, it is probably a bad idea. On the other hand I have seen it eliminate a lot of problems when the history is moderately frequently switching a minimum number of granules. One case that is moderately common is a big batch job growing the buffer pool and tossing out still good plans at the boundary that must be shrunk and then when those plans are needed again (quite possibly to report on the big batch job) they have to toss out buffer pool.  

And folks doing new things in production without testing in non-production first can certainly upset the apple cart, and that tends to be worse for a static allocation (although not fun for anyone.)  

Good luck,  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Lok P Sent: Friday, June 03, 2022 7:28 AM
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Shared pool error even it shows enough free memory  

Thank you Jonathan. My Apology. I missed your reply.

Actually I tried to fetch the 10053 trace post DB restart and parameter change. That simple query completed on node-1. I do see the trace file below pointing all those indexed path evaluation and also usage of the composite stats.

 <https://gist.github.com/oraclelearner/45ed12ed2f3d9c51d640f1bafc68c13c> https://gist.github.com/oraclelearner/45ed12ed2f3d9c51d640f1bafc68c13c

And I see the memory allocation for that sql in gv$sql is below. So all are in the range of ~30KB each. Not sure if this is on the extreme higher side. And regarding granularity, I think we had encountered a bug in the past in which granularity=>AUTO was not collecting/populating the subpartition stats when using INCREMENTAL stats gather option, so granularity is set as 'ALL' since then.

SHARABLE_MEM PERSISTENT_MEM RUNTIME_MEM 31686 31968 30560

And one thing I was wondering in regards to the shared_pool_reserved_size , as this area is mainly to hold the bigger objects(>400bytes) but of course not the data in them. So what object or information must be spanning so many bytes with respect to this composite partition table/index and it has to hold those for query parsing?  

On Fri, Jun 3, 2022 at 4:20 PM Lok P <loknath.73_at_gmail.com> wrote:

We increased the sga_target from ~40GB to ~60GB and also _shared_pool_reserved_pct to 15 and bounced. The failed jobs started running and getting finished. We will continue monitoring. It may be that slowly as we keep adding new partitions day by day though also purging/dropping for most of the table, but in some cases we are only getting added new partitions. And as Tim was pointing earlier, the reserved area(shared_pool_reserved_size) which was set as 800M in our case was now getting thrashed with the new bigger partitions. So now we set that by setting the _shared_pool_reserved_pct to 15. So it will always grow/shrink dynamically whenever the resize operation happens for the shared pool component and will hold the bigger objects by avoiding shared pool fragmentation.  

On Fri, Jun 3, 2022 at 2:49 PM Lok P <loknath.73_at_gmail.com> wrote:

Tried re-generating the 10053 trace for the same query , and this time seeing the queries around X$ views only in the trace.  

 <https://gist.github.com/oraclelearner/4e6ab38fcd39cdecdf0ae4a602b99671> https://gist.github.com/oraclelearner/4e6ab38fcd39cdecdf0ae4a602b99671    

On Fri, Jun 3, 2022 at 12:58 PM Lok P <loknath.73_at_gmail.com> wrote:

This query is failing for node-1 and working fine on node-2. Our application runs on node-1, so it seems node-1 memory is having issues but not node-2. So then we tried generating 10053 traces for node-1 by running that simple 'select 1...' query . It's as below. I see it just failed at the start while doing some 'group by validation'. It didn't go much before failure.

 <https://gist.github.com/oraclelearner/1583dc1ff3350fcb9c104d3d77064efc> https://gist.github.com/oraclelearner/1583dc1ff3350fcb9c104d3d77064efc

Then we did a clean stats gathering as below from another node-2 for this object and it succeeded. Then tried running the same simple query (SELECT 1 as OUT FROM tab_part WHERE 1=2; ) on node-1 that is still failing with ora-4031. Now we are thinking to bounce node-1 and set the _shared_pool_reserved_pct =15 and bounce and see the behaviour.

exec Dbms_stats.gather_table_stats(ownname=>'USER1',tabname=>'TAB_PART', degree=> 16);

Wit below table level preference:-

OWNER TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE USER1 TAB_PART GRANULARITY ALL USER1 TAB_PART INCREMENTAL TRUE USER1 TAB_PART INCREMENTAL_STALENESS USE_STALE_PERCENT USER1 TAB_PART METHOD_OPT FOR ALL COLUMNS SIZE REPEAT     On Thu, Jun 2, 2022 at 9:34 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:  

>> So should we bump up the sga_target keeping other params the same or should we look for another way to fix this up?
 

Fix WHAT up ?

You have one symptom but you haven't identified the problem so you can't fix it.  

Since you seem to have a reproducible and very simple example why don't you:  

alter session set events '10053 trace name context forever;

select /*+ new_query */ * from part_tab where 1 = 2;

alter session set events '10053 trace name context off;  

and have a look at the trace file - you will find under the heading "SINGLE TABLE ACCESS PATH" that Oracle has tried to cost the query for a table scan and for each of the 26 possible indexes, and in the "BASE STATISTICAL INFORMATION" that is has loaded some information about the table and indexes - you may find that each index takes a few lines to say something like "using composite stats", you may find several thousand lines of text about partition and/or subpartition stats, and that might indicate that some of your stats collections procedures have failed and left Oracle trying to create composite stats dynamically from (sub)partition stats - and that MIGHT be the problem.  

Regards

Jonathan Lewis        

On Thu, 2 Jun 2022 at 16:50, Lok P <loknath.73_at_gmail.com> wrote:

And we are seeing the wait event "sga:allocation forcing component growth" exactly from the same point of time when these resize operations happen and the ora-4031 failure happens.  

On Thu, Jun 2, 2022 at 9:06 PM Lok P <loknath.73_at_gmail.com> wrote:

Yes Jonathan. It's true that all these failures are happening on the operation(loading/fetching data etc) related to the partitioned table/indexes. Don't see any written procedure/job which is manually fetching the explain plan but I did see a few queries with 'monitor' hint in them, not sure if that can cause this , but they have been running like that since long. This database does have big partition tables holding ~4000+ partitions; those are compressed ones. And as I checked with the app team, no recent increase in the number of partitions happened or new objects/indexes added to these partitioned tables too. And we collect stats using incremental options on these partition tables.  

I did "select * from part_tab where 1=2 ; " and it failed with below, the index showing the parameter is part of this table only. This is a composite partition table with ~4000+ sub-partitions with ~25 bitmap indexes and one b-tree index as PK, all are local indexes. So should we bump up the sga_target keeping other params the same or should we look for another way to fix this up?  

ORA-04031: unable to allocate 4096 bytes of shared memory ("shared pool","PART_TAB_IX6","pacdHds_kkpaco","kksgaAlloc: firstN")  

On Thu, Jun 2, 2022 at 8:19 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:  

That looks like problems with parsing queries involving partitioned tables, possibly handling statistics on indexes.  

Since it's filling the SGA maybe you're using "explain plan", which demands shared pool memory, rather than running the query and pulling the plan from memory afterwards, which uses PGA memory for the optimisation stages.  

Have you increased the number of partitions of some object recently, or added a couple of indexes to objects with a large number of partitions; or been testing different ways of writing some queries and using explain plan to check the execution paths.  

Regards

Jonathan Lewis      

On Thu, 2 Jun 2022 at 13:01, Lok P <loknath.73_at_gmail.com> wrote:

Hello Listers. Its oracle version is 19.11. We are suddenly seeing many queries failing with Ora-04031 even if we see the "free memory" as ~5GB in v$sgastat where pool='shared pool'. We then flushed the shared pool and also increased the shared pool size to 8GB from initial 6GB, and things ran fine for a couple of hours but we again encountered the same error after a couple of hours of good run. What could be the cause?

ORA-04031: unable to allocate 4096 bytes of shared memory ("shared pool","IDX1","pacdHds_kkpaco","kksgaAlloc: firstN")  

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jun 03 2022 - 20:45:52 CEST

Original text of this message