Re: Query Performance issue

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 1 Jan 2021 10:02:00 +0000
Message-ID: <CAGtsp8nqv7oRpo4mDjwMKAS_sdWLHyMsEvYicdqpnfVxSqOYHw_at_mail.gmail.com>



Lok,
It's important to remember that we are guessing.

We've been looking for anomalies that might help to explain the 1MB global memory bound that's leading to the extreme multi-pass execution; we spotted something that made me raise a hypothesis that would be corroborated (which is not as strong as "proved corrrect" if my prediction that had a very large number of workarea allocations showing in v$sql_workarea allocation ... and you did.

You've now reported, and provided evidence to back the report, that these workareas aren't taking up any memory, so now we need to see if we can refine the hypothesis in a way that seems reasonable and gets us down to that 1MB global memory bound, so I'll suggest the following:

v$pgastat is correctly reporting the PGA allocated (allowing for the lag due to recalculating every 3 seconds) and that is why we can see figures like 10GB allocated with 5GB as the "aggregate PGA auto target " - but my (long ago) original suggestion that the global memory bound should be 20% of this needs to allow for two other factors: a) the number of workareas that are active but have (apparently) not yet received an allocation
b) the minimum allocation that Oracle will give.

There's a hidden parameter _smm_min_size which is the minimum size that will be allocated for an auto workarea - in other words the minimum value for the global memory bound. I think that in your setup (15GB pga_aggregate_target) this will be 1MB. So the latest enhancement to my hypothesis is that with your 13,000 zero-sized workareas Oracle allows for an extra 13,000 MB of PGA allocation when re-calculating the global memory bound; since you always have several hundred connected processes (recently going up to 1,400?) that doesn't leave you with much headroom when a couple of processes want a large workarea.

Regards
Jonathan Lewis

On Fri, 1 Jan 2021 at 07:20, Lok P <loknath.73_at_gmail.com> wrote:

> Out of curiosity, when I see the physical memory in the host it's ~251GB
> out of which ~110GB is currently utilized and it's been the same since past
> ~1month and has not been seeing any such increasing trend. So it means the
> increase in the open cursor which has reached to ~13K now and is tied back
> to~6 different sessions only(may be due to effect connection pulling) for
> that sql_id, is not really putting pressure or keep on consuming actual
> physical memory from the host or saying is not going to cause memory
> starvation. So in short , keeping them open is not going to cause any
> memory shortage issue/failure as such. (Note- Also as I saw the
> actual_mem_used column value in v$sql_workarea_active for all those ~13k
> entries are zero. and even the Sum(work_area_size) which is showing as
> ~37Gb is probably not be the real memory usage here)
>
> But the above is causing the large multipass executions or large temp
> spill because of the algorithm oracle uses to decide " global memory bound
> " based on which it assigns the memory to new active work_area and as you
> mentioned, is most probably the algorithm is based on the number of
> existing active work_area(which is ~13K here ). So as per the calculation
> its 15GB(pga_aggregate_target)-5GB(total_pga_allocated)= ~10GB/13K= .~76MB
> may be rounded up to ~1MB, which is why we are seeing global memory bound
> is set as ~1MB.
>
> Hope my above understanding is correct here.
>
> Regards
> Lok
>
>
>
>
> On Thu, Dec 31, 2020 at 11:28 PM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Yes Jonathan, the count(*) from v$open_cursor for all those 6-
>> session_ids is coming out to be ~15K. So it really appears to be an
>> application code not closing the cursor issue.
>>
>> If i sum(work_area_size) for that sql_id in v$sql_workarea_active, its
>> coming out to be ~37GB, but we have pga_aggregate_target set as ~15GB(out
>> of which ~5GB is total pga allocated in v$pgastat), so not sure if the set
>> global memory bound(i.e. ~1MB) can be retrieved from this one , rather it
>> seems to be derived from the number of work area i.e. ~13K here. Correct me
>> if wrong.
>>
>> Another thing i noticed the , the actual_mem_used and max_mem_used column
>> in v$sql_workarea_active are all populated as Zero for this sql_id, so
>> does this mean that in reality those are actually not consuming any
>> physical memory from the server, rather its just making the calculation of
>> "global memory bound" wrong and thus fooling optimizer to use less memory
>> endup spilling more to temp? or in other words this can't result in any
>> memory starvation like scenario?
>>
>> On Thu, Dec 31, 2020 at 6:57 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> That looks as if it may be the source of the problem - and MAY be
>>> corroborating my latest interpretation that it's the number of active
>>> workareas rather than the total volume of space allocated that results in
>>> the change in the global memory bound.
>>>
>>> If you haven't already killed those sessions then you could
>>> select sid, count(*) from v$open_cursor group by sid -- or something
>>> a little more sophisticated
>>> to check if the hundreds of workareas are linked to cursor that still
>>> appear to be open.
>>>
>>> You could also select sum(workarea_size) for that SQL_ID from
>>> v$sql_workarea_active to see if including that value in my previous
>>> suggestion for estimating the global memory bound gets you close to the
>>> current values.
>>>
>>> If you do see 13,000 open_cursors for that statement then you've
>>> probably got an application error not closing cursors; if you don't then
>>> you may have found an Oracle bug that is losing track of that workarea.
>>>
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>>
>>> On Thu, 31 Dec 2020 at 13:03, Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> Thank You Jonathan.
>>>>
>>>> Something strange in v$sql_workarea_active, its showing total ~13198
>>>> entries out of which ~13184 are belong to one sql query. I was not able to
>>>> search that sql_id in v$session as that sql was not running at this moment
>>>> which is clear from v$session. But then i think the only option left is to
>>>> backtrack through the SID which is there in v$sql_workarea_active
>>>> associated to that sql, i saw 6 distinct SID's and all of those sessions
>>>> are now stays INACTIVE(event sql * net message from client) with
>>>> logon_time showing as 9th Nov. And all of these entries in
>>>> v$sql_workarea_active are on operation_id- 1 and operation_type as "GROUP
>>>> BY (HASH)". Not sure if we hit any bug here, as because the query plan from
>>>> AWR below looks normal(a WITH clause query). And also the
>>>> dba_hist_active_sess_history is logging the pga_allocated against these ~6
>>>> sessions as ~10 to ~15MB too, so it got unnoticed.
>>>>
>>>> I see from dba_hist_sqlstat , this sql used to run ~10-15 times/day and
>>>> now recently running max ~100-200 times/day with avg execution time as
>>>> ~1millisec/execution. So not sure what has happened wrong with it for any
>>>> of the executions. and it can keep spawningg so much work area. But it
>>>> seems that we have to kill these sessions to bring things back to normal.
>>>> Hoping it will not appear again.
>>>>
>>>>
>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 01 2021 - 11:02:00 CET

Original text of this message