Re: High response time with top 'Other' and 'application' wait class

From: Priit Piipuu <priit.piipuu_at_gmail.com>
Date: Tue, 29 Nov 2022 21:20:42 +0100
Message-ID: <CAJYY02hdsthGbo_AcKN_y2WGoems1T0E2NnrwAHXRbCmNN4q4g_at_mail.gmail.com>



"gcs drm freeze in enter server mode" points to the instance crash.

On Tue, 29 Nov 2022, 20:01 Lok P, <loknath.73_at_gmail.com> wrote:

> This is two node RAC and exadata.
> But yes these application queries running on one node only, so not sure
> how this can be impacted such way. And also why suddenly it happened?
>
> On Tue, 29 Nov, 2022, 11:47 pm Jon Crisler, <joncrisler_at_gmail.com> wrote:
>
>> Is this RAC or Exadata ? DRM node moves can be expensive , and if your
>> interconnect is dropping packets it becomes worse . Since you are getting
>> cell metrics, it implies Exadata . I would check gcs_server_processes and
>> high_priority_processes to make sure they are set properly .
>>
>> Sent from my Atari 2600
>>
>> On Nov 29, 2022, at 8:42 AM, Priit Piipuu <priit.piipuu_at_gmail.com> wrote:
>>
>> 
>> Hi!
>>
>> "gcs drm freeze in enter server mode" is specific to RAC. It seems to be
>> triggered by instance leaving or joining the cluster. Brownouts during
>> dynamic remastering is expected behaviour.
>>
>> On Tue, 29 Nov 2022, 03:56 Lok P, <loknath.73_at_gmail.com> wrote:
>>
>>> Hello ,
>>> It's version 19C of the oracle database. We got complaints from the
>>> application team of sudden slowness and thus app traffic automatically
>>> redirected to another active DB replica.While looking into the DB side, we
>>> do observe high wait events from wait class 'Other' and 'Application' in
>>> OEM. It lasted for exactly 2-3 minutes as the application team is
>>> complaining about. And then fetching the "ash wait chain" and "ash top" ,
>>> it's pointing to significantly high wait event like "reliable message"
>>> (~75%), "enq: KO - fast object checkpoint","gcs drm freeze in enter server
>>> mode" from same wait class 'other' and 'application'.
>>>
>>> Below is the ASH top and ASH wait chain from the issue duration of
>>> ~5minutes interval. And the select query which is coming on top. And table
>>> CB is holding just ~41 rows in it.
>>> https://gist.github.com/oraclelearner/44394ab8206fc7bd51041eb3d45bdf9f
>>>
>>> And also the top query which is showing in "ash top" is a SELECT query
>>> and it does have a FTS in it (which is suggesting why the checkpoint
>>> waits), however, this select query doesn't have any plan change observed in
>>> it or not any high execution. And also we are not seeing any specific
>>> blocking session for this. The "wait chain" showing LMON/lock monitor
>>> process in one of the lines also does point to some event like "Wait for
>>> msg sends to complete ''. Not yet able to figure out how these are related
>>> to this issue. Still trying to understand what could have caused such a
>>> scenario?
>>>
>>> This SELECT query executes 100's of thousands of times in ~15minutes
>>> window. So is it possible that this query(or say its underlying table CB)
>>> might have thrashed out from buffer cache to disk and fetched data from
>>> disk rather than cache for that time and this causing all these sort of
>>> issues? And considering this table is very small (~1MB in size and holding
>>> just ~41 rows), should we consider putting it as inmemory by changing
>>> "inmemory_clause_default" to 'BASE_LEVEL' and "INMEMORY_SIZE" to 16GB ?
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 29 2022 - 21:20:42 CET

Original text of this message