Re: Seeking understanding of my "gc cr multi block request" waits

From: Richard Piasecki <usenet2_at_ogoent.com>
Date: 18 Jul 2007 08:51:03 -0500
Message-ID: <jp5s93lv281611bkd50qo0r74riu07gp6h@4ax.com>

On Tue, 17 Jul 2007 18:16:40 -0700, DA Morgan <damorgan_at_psoug.org> wrote:

>sybrandb_at_hccnet.nl wrote:
>> On Tue, 17 Jul 2007 12:13:44 -0700, DA Morgan <damorgan_at_psoug.org>
>> wrote:
>>
>>> Richard Piasecki wrote:
>>>> Hello everyone.
>>>>
>>>> I am the DBA for a three-node RAC database that is suffering from an intermittent performance problem. The Oracle release is 10.2.0.1 on RedHat linux.
>>>> Periodically, a particular job that normally runs in less than a second takes several seconds to complete, sometimes requiring as much as 20 seconds.
>>>> An examination of raw trace files created during the job execution has revealed that the wait event causing the slowdown is "gc cr multi block
>>>> request". This particular wait event is not well-documented by Oracle, and I have found little information on it on the internet. I am posting this
>>>> message to try to get confirmation of my understanding of the event from the other DBA's who may have experienced it.
>>>>
>>>> I'm thinking the cause of the wait event is several possibilities, and I want to get everyone's opinion on the subject. Mr. Gopalakrishnan's wonderful
>>>> book on RAC does not mention "gc cr multi block request". It does, however, mention the event "gc cr request" as a "Place Holder" event.
>>>>
>>>> 1. If "gc cr multi block request" is also a place holder event (which would seem logical, implying multi-block IO), I would think I should never see
>>>> it as a major event. I would think it should be substituted by one of the "gc*2-way" or "gc*3-way" events, as the Gopalakrishnan book implies. So, I'm
>>>> confused as to why I am getting so much "gc cr multi block request" waits without any "2-way" or "3-way" waits.
>>>>
>>>> 2. On the other hand, if the "gc cr multi block request" event is not a place holder event, does it indicate the wait time experienced by the instance
>>>> while trying to get a lock request from the master of the resource? A few of these waits are over a second, according to the raw trace file. That
>>>> seems like an awfully long time to just get a lock request from the master. Any idea what could cause that?
>>>>
>>>> 3. Could the "gc cr multi block request" event be the total time spent obtaining the block from another instance? This would make sense given the
>>>> length of some of these events and the total lack of any "2-way" or "3-way" events. But, that doesn't jive with the information in the Gopalakrishnan
>>>> book or in any other resource I have read.
>>>>
>>>> 4. There is an Oracle bug #3951017, but it is supposedly fixed in 10gR2, and it causes a widespread slowdown, not the intermittent, specific to a
>>>> single session, type of slowdown that I am experiencing. So, I doubt I have this problem.
>>>>
>>>>
>>>> Can anyone confirm if this wait event is one of the four possibilities above or one I haven't mentioned?
>>>>
>>>>
>>>>
>>>> All these possibilities share the same set of possible solutions (except for #4), according to my understanding. Please correct me if my list of
>>>> solutions is erroneous or incomplete.
>>>>
>>>> 1. Use application/data partitioning techniques to try to remove the inter-instance contention for the blocks in question.
>>>> 2. Use jumbo frames on the interconnect (the current interconnect is configured with an MTU of 1500)
>>>> 3. I think disk-IO is sometimes involved with these cache fusion operations to flush redo log buffers, so improving disk speed may help as well. We
>>>> are currently on RAID-5 but plan to implement a series of RAID-1 arrays under ASM control in the near future.
>>>> 4. Tweaking the number of LMS processes on the holding instance, but no CPU spikes have been noticed during these slowdowns, so I question having to
>>>> do this.
>>>>
>>>>
>>>>
>>>> Does anyone have any thoughts, comments, suggestions, explanations or solutions that may help me to decipher the reasons for these waits and the means
>>>> by which to eliminate them. Any help would be gratefully accepted. Thanks.
>>>>
>>>>
>>>>
>>>> --- Rich
>>> Run these:
>>>
>>> -- Current block transfer statistics
>>>
>>> col "AVG RECVD TIME (ms)" format 9999999.9
>>> col inst_id format 9999
>>> prompt GCS CURRENT BLOCKS
>>>
>>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>>FROM gv$sysstat b1, gv$sysstat b2
>>> WHERE b1.name = 'global cache current block receive time'
>>> AND b2.name = 'global cache current blocks received'
>>> AND b1.inst_id = b2.inst_id;
>>>
>>> -- block contention measured by using block transfer time
>>>
>>> col "AVG RECVD TIME (ms)" format 9999999.9
>>> col inst_id format 9999
>>>
>>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>>FROM gv$sysstat b1, gv$sysstat b2
>>> WHERE b1.name = 'global cache cr block receive time'
>>> AND b2.name = 'global cache cr blocks received'
>>> AND b1.inst_id = b2.inst_id;
>>>
>>> They may point in the right direction.
>>
>> Just a quick question (I'm a bit weary of submitting an SR again,
>> especially where Oracle has been so unhelpful). It quite often happens
>> to me *any* query (even the most simple ones) from any gv$ view just
>> *hangs* forever. Tracing the session I noticed a bunch of DFS lock
>> handle events and a bunch of PX Deq Reap Credit events.
>> Metaclunck did come up with a bunch of blabla, but no solutions or
>> even workarounds.
>> Am I correct in stating Oracle will stop working on 9iR2 SRs after
>> July 31?
>
>I can't answer that one but my experience with RAC from 9.2.0.4 to
>the present would make me want to upgrade every cluster I could lay
>my hands on to 10.2.0.1 or higher. The increase in stability and of
>the technology is substantial: Especially the installation with the
>cluster verify tool.

I thank you for the replies, gentlemen.

Daniel, I'll have to run the queries you posted for the problematic session, only. Running them for the whole system won't give me much information, I'm afraid.

Sybrand, we're also using UDP over 1 Gb ethernet. Also, I know you are on 9i, but could your problem be related to bug #3951017? The good news for your situation is that Oracle has waived the first year of the extended support fee, so you won't be hit by that 10% penalty when 9i moves to extended support on August 1. And, doesn't Oracle still provide the same level of support with extended as they do with premier, just at an increased price? If that's the case, they should still work on SRs. But, don't quote me.

Rich

Received on Wed Jul 18 2007 - 08:51:03 CDT