RE: Event : latch: ges resource hash list

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Mon, 4 Oct 2021 10:17:56 -0400
Message-ID: <2ba901d7b92a$a142e600$e3c8b200$_at_rsiz.com>



Untested completely theoretical notion: It may actually take longer with the other instances down. Is it possible that the application will tolerate the second instance being up but in restricted mode so that only “DBA” authority can connect?  

Since I don’t have the code I can only guess, but it’s possible that only a ping memory to memory is needed for instances that are up whilst probing the down instance’s undo and/or redo is required. That might take long enough for hash waits to pile up.  

But first up with a bullet is JL’s suggestion of badly configured sequences, which dovetails nicely with a vendor lacking sufficient understanding of Oracle to support multiple instances being up.  

And a question: What is the purpose of being RAC in this case? If you are thinking rapid fail-over, I’d suggest you consider changing your configuration to standby-recovery either roll your own or Dataguard. With the second instance normally down, I’d like your odds that a complete recovery failover to the standby is either faster than or negligibly slower than RAC, and it eliminates all the RAC overheads for multi-instance coordination. As JL pointed out, some of the RACTAX™ applies even when only one instance is up. RAC is wonderful if you really need it, but YPDNR.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Krishnaprasad Yadav Sent: Monday, October 04, 2021 6:50 AM
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Event : latch: ges resource hash list  

Hi Jonathan ,  

Thanks for your mail, I understand the above points , and will try to drive in a similar direction as you have mentioned .  

Regards,

Krishna  

On Mon, 4 Oct 2021 at 16:13, Krishnaprasad Yadav <chrishna0007_at_gmail.com> wrote:

Hi Jonathan,  

Its 2 node rac system , and only one instance is running and the other one is down .  

Regards,

Krishna  

On Mon, 4 Oct 2021 at 15:18, Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

GES is the global enqueue service (which isn't about buffer cache), so it looks as if you are doing something that requires coordination of some locking event. (And the code path is followed regardless of how many instances are up.)  

I would take a couple of snapshots of v$enqueue_stat over a short period of time to see if any specific enqueue is being acquired very frequently; but some global enqueue gets don't get recorded in that view - so it may show nothing interesting. And I would do the same (snapshots) of v$rowcache to see if any if the dictionary cache objects were subject to a high rate of access. EIther of these might give you some clue about what's going on.  

Historic issues:  

sequences being accessed very frequently and declared with NOCACHE (or very small CACHE) or with ORDER.  

Some bugs relating to tablespace handling, undo handling, VPD, the result in massive overload on dc_tablespaces, dc_users, dc_objects, dc_rollback_segments (though I can't remember if any of them were still around in 12.2).    

Regards

Jonathan Lewis    

On Mon, 4 Oct 2021 at 10:23, Krishnaprasad Yadav <chrishna0007_at_gmail.com> wrote:

Hi Experts ,  

There is a situation around which is causing an event : latch: ges resource hash list in database . CRS /RDBMS is 12c2 version on solaris  

DB is 2 node RAC , but due to application compatibility node 2 always remains down. however on node 1 we lot of query waiting for latch : ges resource hash list ,(no specific query is ,but all )  

on node 2 ,the complete CRS stack is down , not sure why this event is popping up on node1 .  

Parallely CPU for node 1 also remains higher more than 80% most of the time .  

Any light about this event will be helpful .      

Regards,

Krishna    

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Oct 04 2021 - 16:17:56 CEST

Original text of this message