Re: monitor cluster interconnect performance
Date: Sat, 9 Aug 2008 09:31:34 -0700 (PDT)
Message-ID: <3381ea0f-3037-4a93-b033-5b663b85b37c@e53g2000hsa.googlegroups.com>
On Aug 8, 3:11 pm, sybra..._at_hccnet.nl wrote:
> On Fri, 8 Aug 2008 11:22:52 -0700 (PDT), mike <mike.shan..._at_gmail.com>
> wrote:
>
> >Hi,
> >We have a 10g service that is load balanced between 2 RAC nodes. We
> >noticed that query runs about 3x slower if the service is load
> >balanced on both nodes vs running the service on a single node. We did
> >verify that all traffic runs on the private network. On OEM, the GC
> >block access latency was within 12 ms and the GC block transfer rate
> >was within 30ms.
>
> >Are these numbers considered high? What other queries or utilities can
> >I use to monitor the performance of the interconnect?
>
> >Thanks.
>
> Figures look pretty high.
> Is your network configured correctly? (1G network, MTU set to 9k)
> Did you run racdiag.sql (available on Metalink)?
> Did you look at explain plans? Is there any impact of Cache Fusion?
> Do you have access to the Oracle Wait Interface Book? If not, buy it.
> It contains a section on RAC.
> Likely Cache Fusion is killing you.
> If your queries are unscalable, Cache Fusion will only make things
> worse, especially when your Interconnects have not been setup
> correctly.
> In 9i RAC 1 multiblock request resulted in n global cache cr requests
> across the interlink. This has been fixed in 10gR2, but as you only
> post a marketing label instead of a version, I can't see whether you
> are still suffering from this 'feature'.
>
> Hth
>
> --
> Sybrand Bakker
> Senior Oracle DBA
As Sybrand and Daniel have stated the number appear high but with such limited information it is difficult to make definite judgements.
I suggest you first verify that you have set up the RAC traffic to use the private interface and not to be sent out over the front-end network.
Then if you have a Performance Pack license you can look in the AWR for your gc related statistics. If no Performance Pack license then just run a statspack over a problem time period and take a look at your wait events. If you see gc waits (RAC) you can reseach the reported values helping you determine if the problem is RAC related. Then you have to figure out if it is a hardware performance verse application design/load issue.
Designing your applicaiton so that all (or at least most) updates to specific tables come from one instance can in some cases make a world of difference in performance. It is still true that one non-RAC instance on an N cpu machine with the same total memory can probably outperform 2 or 4 machines with the same total number of cpu's and memory running RAC where no update actiivity direction is being applied. Separating DML activity for performance purposes may be as simple as running all batch against a single instance so that you do not two batch jobs heavily updating the same table running concurrently on two different instance causing a jump in gc activity. It all depends on your application.
HTH -- Mark D Powell -- Received on Sat Aug 09 2008 - 11:31:34 CDT