RE: Event : latch: ges resource hash list
Date: Mon, 4 Oct 2021 10:17:56 -0400
Message-ID: <2ba901d7b92a$a142e600$e3c8b200$_at_rsiz.com>
Untested completely theoretical notion: It may actually take longer with the other instances down. Is it possible that the application will tolerate the second instance being up but in restricted mode so that only “DBA” authority can connect?
Since I don’t have the code I can only guess, but it’s possible that only a ping memory to memory is needed for instances that are up whilst probing the down instance’s undo and/or redo is required. That might take long enough for hash waits to pile up.
But first up with a bullet is JL’s suggestion of badly configured sequences, which dovetails nicely with a vendor lacking sufficient understanding of Oracle to support multiple instances being up.
And a question: What is the purpose of being RAC in this case? If you are thinking rapid fail-over, I’d suggest you consider changing your configuration to standby-recovery either roll your own or Dataguard. With the second instance normally down, I’d like your odds that a complete recovery failover to the standby is either faster than or negligibly slower than RAC, and it eliminates all the RAC overheads for multi-instance coordination. As JL pointed out, some of the RACTAX™ applies even when only one instance is up. RAC is wonderful if you really need it, but YPDNR.
mwf
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Krishnaprasad Yadav
Sent: Monday, October 04, 2021 6:50 AM
Hi Jonathan ,
Thanks for your mail, I understand the above points , and will try to drive in a similar direction as you have mentioned .
Regards,
Krishna
On Mon, 4 Oct 2021 at 16:13, Krishnaprasad Yadav <chrishna0007_at_gmail.com> wrote:
Hi Jonathan,
Its 2 node rac system , and only one instance is running and the other one is down .
Regards,
Krishna
On Mon, 4 Oct 2021 at 15:18, Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:
GES is the global enqueue service (which isn't about buffer cache), so it looks as if you are doing something that requires coordination of some locking event. (And the code path is followed regardless of how many instances are up.)
I would take a couple of snapshots of v$enqueue_stat over a short period of time to see if any specific enqueue is being acquired very frequently; but some global enqueue gets don't get recorded in that view - so it may show nothing interesting. And I would do the same (snapshots) of v$rowcache to see if any if the dictionary cache objects were subject to a high rate of access. EIther of these might give you some clue about what's going on.
Historic issues:
sequences being accessed very frequently and declared with NOCACHE (or very small CACHE) or with ORDER.
Some bugs relating to tablespace handling, undo handling, VPD, the result in massive overload on dc_tablespaces, dc_users, dc_objects, dc_rollback_segments (though I can't remember if any of them were still around in 12.2).
Regards
Jonathan Lewis
On Mon, 4 Oct 2021 at 10:23, Krishnaprasad Yadav <chrishna0007_at_gmail.com> wrote:
Hi Experts ,
There is a situation around which is causing an event : latch: ges resource hash list in database . CRS /RDBMS is 12c2 version on solaris
DB is 2 node RAC , but due to application compatibility node 2 always remains down. however on node 1 we lot of query waiting for latch : ges resource hash list ,(no specific query is ,but all )
on node 2 ,the complete CRS stack is down , not sure why this event is popping up on node1 .
Parallely CPU for node 1 also remains higher more than 80% most of the time .
Any light about this event will be helpful .
Regards,
Krishna
To: Jonathan Lewis
Cc: Oracle L
Subject: Re: Event : latch: ges resource hash list
--
http://www.freelists.org/webpage/oracle-l
Received on Mon Oct 04 2021 - 16:17:56 CEST