Re: Seeking understanding of my "gc cr multi block request" waits

From: hpuxrac <johnbhurley_at_sbcglobal.net>
Date: Wed, 18 Jul 2007 12:39:20 -0700
Message-ID: <1184787560.320555.4520@g12g2000prg.googlegroups.com>

On Jul 17, 12:05 pm, Richard Piasecki <usen..._at_ogoent.com> wrote:
> Hello everyone.
>
> I am the DBA for a three-node RAC database that is suffering from an intermittent performance problem. The Oracle release is 10.2.0.1 on RedHat linux.
> Periodically, a particular job that normally runs in less than a second takes several seconds to complete, sometimes requiring as much as 20 seconds.
> An examination of raw trace files created during the job execution has revealed that the wait event causing the slowdown is "gc cr multi block
> request". This particular wait event is not well-documented by Oracle, and I have found little information on it on the internet. I am posting this
> message to try to get confirmation of my understanding of the event from the other DBA's who may have experienced it.
>
> I'm thinking the cause of the wait event is several possibilities, and I want to get everyone's opinion on the subject. Mr. Gopalakrishnan's wonderful
> book on RAC does not mention "gc cr multi block request". It does, however, mention the event "gc cr request" as a "Place Holder" event.
>
> 1. If "gc cr multi block request" is also a place holder event (which would seem logical, implying multi-block IO), I would think I should never see
> it as a major event. I would think it should be substituted by one of the "gc*2-way" or "gc*3-way" events, as the Gopalakrishnan book implies. So, I'm
> confused as to why I am getting so much "gc cr multi block request" waits without any "2-way" or "3-way" waits.
>
> 2. On the other hand, if the "gc cr multi block request" event is not a place holder event, does it indicate the wait time experienced by the instance
> while trying to get a lock request from the master of the resource? A few of these waits are over a second, according to the raw trace file. That
> seems like an awfully long time to just get a lock request from the master. Any idea what could cause that?
>
> 3. Could the "gc cr multi block request" event be the total time spent obtaining the block from another instance? This would make sense given the
> length of some of these events and the total lack of any "2-way" or "3-way" events. But, that doesn't jive with the information in the Gopalakrishnan
> book or in any other resource I have read.
>
> 4. There is an Oracle bug #3951017, but it is supposedly fixed in 10gR2, and it causes a widespread slowdown, not the intermittent, specific to a
> single session, type of slowdown that I am experiencing. So, I doubt I have this problem.
>
> Can anyone confirm if this wait event is one of the four possibilities above or one I haven't mentioned?
>
> All these possibilities share the same set of possible solutions (except for #4), according to my understanding. Please correct me if my list of
> solutions is erroneous or incomplete.
>
> 1. Use application/data partitioning techniques to try to remove the inter-instance contention for the blocks in question.
> 2. Use jumbo frames on the interconnect (the current interconnect is configured with an MTU of 1500)
> 3. I think disk-IO is sometimes involved with these cache fusion operations to flush redo log buffers, so improving disk speed may help as well. We
> are currently on RAID-5 but plan to implement a series of RAID-1 arrays under ASM control in the near future.
> 4. Tweaking the number of LMS processes on the holding instance, but no CPU spikes have been noticed during these slowdowns, so I question having to
> do this.
>
> Does anyone have any thoughts, comments, suggestions, explanations or solutions that may help me to decipher the reasons for these waits and the means
> by which to eliminate them. Any help would be gratefully accepted. Thanks.
>
> --- Rich

Have you submitted a service request to oracle? Eventually as you work it thru you should be able to get some relevant help ( eventually ).

RAC and grid have been getting way too much marketing hype from oracle. Kind of very similar to the flameout that we saw earlier with Oracle Parallel Server marketing.

The last 2 years at Oracle Open World there have been a lot more people talking openly about having to partition their RAC databases so that workloads are partitioned by application to specific nodes. Have you read the "you probably don't need rac" stuff by Moans Nogood?

Have you put your trace files through a resource profiler such as orasrp? Sometimes looking at just the lines in them may not give the correct overall response time profile needed for following Cary Millsap's Method R approach.

You might want to think seriously about attending one of the hotsos new courses which are supposed to have a serious focus on RAC related events ( as I understand it ) or negotiate to bring in one of the hotsos guys for a day or two.

Overall your possibilities that you list "appear to be" fairly reasonable to me. Received on Wed Jul 18 2007 - 14:39:20 CDT