Re: Seeking understanding of my "gc cr multi block request" waits

From: DA Morgan <damorgan_at_psoug.org>
Date: Tue, 17 Jul 2007 18:16:40 -0700
Message-ID: <1184721397.980651@bubbleator.drizzle.com>

sybrandb_at_hccnet.nl wrote:

> On Tue, 17 Jul 2007 12:13:44 -0700, DA Morgan <damorgan_at_psoug.org>
> wrote:
>

>> Richard Piasecki wrote:
>>> Hello everyone.
>>>
>>> I am the DBA for a three-node RAC database that is suffering from an intermittent performance problem. The Oracle release is 10.2.0.1 on RedHat linux.
>>> Periodically, a particular job that normally runs in less than a second takes several seconds to complete, sometimes requiring as much as 20 seconds.
>>> An examination of raw trace files created during the job execution has revealed that the wait event causing the slowdown is "gc cr multi block
>>> request". This particular wait event is not well-documented by Oracle, and I have found little information on it on the internet. I am posting this
>>> message to try to get confirmation of my understanding of the event from the other DBA's who may have experienced it.
>>>
>>> I'm thinking the cause of the wait event is several possibilities, and I want to get everyone's opinion on the subject. Mr. Gopalakrishnan's wonderful
>>> book on RAC does not mention "gc cr multi block request". It does, however, mention the event "gc cr request" as a "Place Holder" event.
>>>
>>> 1. If "gc cr multi block request" is also a place holder event (which would seem logical, implying multi-block IO), I would think I should never see
>>> it as a major event. I would think it should be substituted by one of the "gc*2-way" or "gc*3-way" events, as the Gopalakrishnan book implies. So, I'm
>>> confused as to why I am getting so much "gc cr multi block request" waits without any "2-way" or "3-way" waits.
>>>
>>> 2. On the other hand, if the "gc cr multi block request" event is not a place holder event, does it indicate the wait time experienced by the instance
>>> while trying to get a lock request from the master of the resource? A few of these waits are over a second, according to the raw trace file. That
>>> seems like an awfully long time to just get a lock request from the master. Any idea what could cause that?
>>>
>>> 3. Could the "gc cr multi block request" event be the total time spent obtaining the block from another instance? This would make sense given the
>>> length of some of these events and the total lack of any "2-way" or "3-way" events. But, that doesn't jive with the information in the Gopalakrishnan
>>> book or in any other resource I have read.
>>>
>>> 4. There is an Oracle bug #3951017, but it is supposedly fixed in 10gR2, and it causes a widespread slowdown, not the intermittent, specific to a
>>> single session, type of slowdown that I am experiencing. So, I doubt I have this problem.
>>>
>>>
>>> Can anyone confirm if this wait event is one of the four possibilities above or one I haven't mentioned?
>>>
>>>
>>>
>>> All these possibilities share the same set of possible solutions (except for #4), according to my understanding. Please correct me if my list of
>>> solutions is erroneous or incomplete.
>>>
>>> 1. Use application/data partitioning techniques to try to remove the inter-instance contention for the blocks in question.
>>> 2. Use jumbo frames on the interconnect (the current interconnect is configured with an MTU of 1500)
>>> 3. I think disk-IO is sometimes involved with these cache fusion operations to flush redo log buffers, so improving disk speed may help as well. We
>>> are currently on RAID-5 but plan to implement a series of RAID-1 arrays under ASM control in the near future.
>>> 4. Tweaking the number of LMS processes on the holding instance, but no CPU spikes have been noticed during these slowdowns, so I question having to
>>> do this.
>>>
>>>
>>>
>>> Does anyone have any thoughts, comments, suggestions, explanations or solutions that may help me to decipher the reasons for these waits and the means
>>> by which to eliminate them. Any help would be gratefully accepted. Thanks.
>>>
>>>
>>>
>>> --- Rich
>> Run these:
>>
>> -- Current block transfer statistics
>>
>> col "AVG RECVD TIME (ms)" format 9999999.9
>> col inst_id format 9999
>> prompt GCS CURRENT BLOCKS
>>
>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>FROM gv$sysstat b1, gv$sysstat b2
>> WHERE b1.name = 'global cache current block receive time'
>> AND b2.name = 'global cache current blocks received'
>> AND b1.inst_id = b2.inst_id;
>>
>> -- block contention measured by using block transfer time
>>
>> col "AVG RECVD TIME (ms)" format 9999999.9
>> col inst_id format 9999
>>
>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>FROM gv$sysstat b1, gv$sysstat b2
>> WHERE b1.name = 'global cache cr block receive time'
>> AND b2.name = 'global cache cr blocks received'
>> AND b1.inst_id = b2.inst_id;
>>
>> They may point in the right direction.

> 
> Just a quick question (I'm a bit weary of submitting an SR again,
> especially where Oracle has been so unhelpful). It quite often happens
> to me *any* query (even the most simple ones) from any gv$ view just
> *hangs* forever. Tracing the session I noticed a bunch of DFS lock
> handle events and a bunch of PX Deq Reap Credit events.
> Metaclunck did come up with a bunch of blabla, but no solutions or
> even workarounds.
> Am I correct in stating Oracle will stop working on 9iR2 SRs after
> July 31?

I can't answer that one but my experience with RAC from 9.2.0.4 to the present would make me want to upgrade every cluster I could lay my hands on to 10.2.0.1 or higher. The increase in stability and of the technology is substantial: Especially the installation with the cluster verify tool.

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

Received on Tue Jul 17 2007 - 20:16:40 CDT