Re: Seeking understanding of my "gc cr multi block request" waits

From: <sybrandb_at_hccnet.nl>
Date: Tue, 17 Jul 2007 20:35:25 +0200
Message-ID: <8n2q9397cuc9q5rfg84hoqff4ovdrnuemt@4ax.com>

On 17 Jul 2007 11:05:06 -0500, Richard Piasecki <usenet2_at_ogoent.com> wrote:

>
>Hello everyone.
>
>I am the DBA for a three-node RAC database that is suffering from an intermittent performance problem. The Oracle release is 10.2.0.1 on RedHat linux.
>Periodically, a particular job that normally runs in less than a second takes several seconds to complete, sometimes requiring as much as 20 seconds.
>An examination of raw trace files created during the job execution has revealed that the wait event causing the slowdown is "gc cr multi block
>request". This particular wait event is not well-documented by Oracle, and I have found little information on it on the internet. I am posting this
>message to try to get confirmation of my understanding of the event from the other DBA's who may have experienced it.
>
>I'm thinking the cause of the wait event is several possibilities, and I want to get everyone's opinion on the subject. Mr. Gopalakrishnan's wonderful
>book on RAC does not mention "gc cr multi block request". It does, however, mention the event "gc cr request" as a "Place Holder" event.
>
>1. If "gc cr multi block request" is also a place holder event (which would seem logical, implying multi-block IO), I would think I should never see
>it as a major event. I would think it should be substituted by one of the "gc*2-way" or "gc*3-way" events, as the Gopalakrishnan book implies. So, I'm
>confused as to why I am getting so much "gc cr multi block request" waits without any "2-way" or "3-way" waits.
>
>2. On the other hand, if the "gc cr multi block request" event is not a place holder event, does it indicate the wait time experienced by the instance
>while trying to get a lock request from the master of the resource? A few of these waits are over a second, according to the raw trace file. That
>seems like an awfully long time to just get a lock request from the master. Any idea what could cause that?
>
>3. Could the "gc cr multi block request" event be the total time spent obtaining the block from another instance? This would make sense given the
>length of some of these events and the total lack of any "2-way" or "3-way" events. But, that doesn't jive with the information in the Gopalakrishnan
>book or in any other resource I have read.
>
>4. There is an Oracle bug #3951017, but it is supposedly fixed in 10gR2, and it causes a widespread slowdown, not the intermittent, specific to a
>single session, type of slowdown that I am experiencing. So, I doubt I have this problem.
>
>
>Can anyone confirm if this wait event is one of the four possibilities above or one I haven't mentioned?
>
>
>
>All these possibilities share the same set of possible solutions (except for #4), according to my understanding. Please correct me if my list of
>solutions is erroneous or incomplete.
>
>1. Use application/data partitioning techniques to try to remove the inter-instance contention for the blocks in question.
>2. Use jumbo frames on the interconnect (the current interconnect is configured with an MTU of 1500)
>3. I think disk-IO is sometimes involved with these cache fusion operations to flush redo log buffers, so improving disk speed may help as well. We
>are currently on RAID-5 but plan to implement a series of RAID-1 arrays under ASM control in the near future.
>4. Tweaking the number of LMS processes on the holding instance, but no CPU spikes have been noticed during these slowdowns, so I question having to
>do this.
>
>
>
>Does anyone have any thoughts, comments, suggestions, explanations or solutions that may help me to decipher the reasons for these waits and the means
>by which to eliminate them. Any help would be gratefully accepted. Thanks.
>
>
>
>--- Rich

I'm suffering from more or less the same problem on 9.2.0.8 on AIX. In my case the app is fully unscalable and performs many unneeded FTS. The way I understand 'Cache Fusion' the buffer cache essentially operates as one cache.
As soon as one instance requests a block and doesn't find it, it will call out to all other instances using IPC. What are you using for IPC mechanism? In my case it is UDP over 1 Gb Ethernet.
Jumbo frames is off, in my case, which I regret, as it will increase MTU to 9k.

I have a SR open for this one, but is now open for more than a month, and Oracle does next to nothing.

-- 
Sybrand Bakker
Senior Oracle DBA

Received on Tue Jul 17 2007 - 13:35:25 CDT