Re: Seeking understanding of my "gc cr multi block request" waits

From: DA Morgan <damorgan_at_psoug.org>
Date: Thu, 19 Jul 2007 09:36:32 -0700
Message-ID: <1184862991.764587@bubbleator.drizzle.com>

sybrandb wrote:

> On Jul 18, 3:16 am, DA Morgan <damor..._at_psoug.org> wrote:
>> sybra..._at_hccnet.nl wrote:
>>> On Tue, 17 Jul 2007 12:13:44 -0700, DA Morgan <damor..._at_psoug.org>
>>> wrote:
>>>> Richard Piasecki wrote:

>>>>> Hello everyone.
>>>>> I am the DBA for a three-node RAC database that is suffering from an intermittent performance problem. The Oracle release is 10.2.0.1 on RedHat linux.
>>>>> Periodically, a particular job that normally runs in less than a second takes several seconds to complete, sometimes requiring as much as 20 seconds.
>>>>> An examination of raw trace files created during the job execution has revealed that the wait event causing the slowdown is "gc cr multi block
>>>>> request". This particular wait event is not well-documented by Oracle, and I have found little information on it on the internet. I am posting this
>>>>> message to try to get confirmation of my understanding of the event from the other DBA's who may have experienced it.
>>>>> I'm thinking the cause of the wait event is several possibilities, and I want to get everyone's opinion on the subject. Mr. Gopalakrishnan's wonderful
>>>>> book on RAC does not mention "gc cr multi block request". It does, however, mention the event "gc cr request" as a "Place Holder" event.
>>>>> 1. If "gc cr multi block request" is also a place holder event (which would seem logical, implying multi-block IO), I would think I should never see
>>>>> it as a major event. I would think it should be substituted by one of the "gc*2-way" or "gc*3-way" events, as the Gopalakrishnan book implies. So, I'm
>>>>> confused as to why I am getting so much "gc cr multi block request" waits without any "2-way" or "3-way" waits.
>>>>> 2. On the other hand, if the "gc cr multi block request" event is not a place holder event, does it indicate the wait time experienced by the instance
>>>>> while trying to get a lock request from the master of the resource? A few of these waits are over a second, according to the raw trace file. That
>>>>> seems like an awfully long time to just get a lock request from the master. Any idea what could cause that?
>>>>> 3. Could the "gc cr multi block request" event be the total time spent obtaining the block from another instance? This would make sense given the
>>>>> length of some of these events and the total lack of any "2-way" or "3-way" events. But, that doesn't jive with the information in the Gopalakrishnan
>>>>> book or in any other resource I have read.
>>>>> 4. There is an Oracle bug #3951017, but it is supposedly fixed in 10gR2, and it causes a widespread slowdown, not the intermittent, specific to a
>>>>> single session, type of slowdown that I am experiencing. So, I doubt I have this problem.
>>>>> Can anyone confirm if this wait event is one of the four possibilities above or one I haven't mentioned?
>>>>> All these possibilities share the same set of possible solutions (except for #4), according to my understanding. Please correct me if my list of
>>>>> solutions is erroneous or incomplete.
>>>>> 1. Use application/data partitioning techniques to try to remove the inter-instance contention for the blocks in question.
>>>>> 2. Use jumbo frames on the interconnect (the current interconnect is configured with an MTU of 1500)
>>>>> 3. I think disk-IO is sometimes involved with these cache fusion operations to flush redo log buffers, so improving disk speed may help as well. We
>>>>> are currently on RAID-5 but plan to implement a series of RAID-1 arrays under ASM control in the near future.
>>>>> 4. Tweaking the number of LMS processes on the holding instance, but no CPU spikes have been noticed during these slowdowns, so I question having to
>>>>> do this.
>>>>> Does anyone have any thoughts, comments, suggestions, explanations or solutions that may help me to decipher the reasons for these waits and the means
>>>>> by which to eliminate them. Any help would be gratefully accepted. Thanks.
>>>>> --- Rich

>>>> Run these:
>>>> -- Current block transfer statistics
>>>> col "AVG RECVD TIME (ms)" format 9999999.9
>>>> col inst_id format 9999
>>>> prompt GCS CURRENT BLOCKS
>>>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>>>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>> >FROM gv$sysstat b1, gv$sysstat b2
>>>> WHERE b1.name = 'global cache current block receive time'
>>>> AND b2.name = 'global cache current blocks received'
>>>> AND b1.inst_id = b2.inst_id;
>>>> -- block contention measured by using block transfer time
>>>> col "AVG RECVD TIME (ms)" format 9999999.9
>>>> col inst_id format 9999
>>>> SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME",
>>>> ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
>>> >FROM gv$sysstat b1, gv$sysstat b2
>>>> WHERE b1.name = 'global cache cr block receive time'
>>>> AND b2.name = 'global cache cr blocks received'
>>>> AND b1.inst_id = b2.inst_id;
>>>> They may point in the right direction.
>>> Just a quick question (I'm a bit weary of submitting an SR again,
>>> especially where Oracle has been so unhelpful). It quite often happens
>>> to me *any* query (even the most simple ones) from any gv$ view just
>>> *hangs* forever. Tracing the session I noticed a bunch of DFS lock
>>> handle events and a bunch of PX Deq Reap Credit events.
>>> Metaclunck did come up with a bunch of blabla, but no solutions or
>>> even workarounds.
>>> Am I correct in stating Oracle will stop working on 9iR2 SRs after
>>> July 31?
>> I can't answer that one but my experience with RAC from 9.2.0.4 to
>> the present would make me want to upgrade every cluster I could lay
>> my hands on to 10.2.0.1 or higher. The increase in stability and of
>> the technology is substantial: Especially the installation with the
>> cluster verify tool.
>> --
>> Daniel A. Morgan
>> University of Washington
>> damor..._at_x.washington.edu (replace x with u to respond)
>> Puget Sound Oracle Users Groupwww.psoug.org- Hide quoted text -
>>
>> - Show quoted text -
> 
> Not possible. The app isn't certified against 10g, and uses a mix of
> RBO and CBO.
> 
> --
> Sybrand Bakker
> Senior Oracle DBA

Would they let you go to 10g if you set optimizer_level to 1?

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

Received on Thu Jul 19 2007 - 11:36:32 CDT