Re: RAC Cluster Waits

From: Mladen Gogala <mgogala.SPAM_ME.NOT_at_verizon.net>
Date: Thu, 07 Jun 2007 03:59:42 GMT
Message-ID: <pan.2007.06.07.03.59.42@verizon.net>

On Tue, 05 Jun 2007 16:14:04 -0700, silverback wrote:

> Hi,
>
> I have a 10.2 RAC with 3 nodes. It is an OLTP system.
>
> We are seeing what may be "excessive" cluster waits.
>
> I need to know how to judge what is ACTUALLY excessive, and how to tune
> the system. Please direct me to a detailed or complete refrence
> describing the parts of "Cluster" and how to tune them. I see many lists
> of wait events on the web, but no description nor how to tune.

One doesn't tune the system. Application is what needs tuning. You can tune an application for response time or for resource consumption, but you cannot tune the system. There is an old joke which says that a database without users or data has never had any problems. Problems come from users, data or the combination.

>
> Why do I think it may be excessive?
>
> This was a CPU intensive system in the 9i, non-RAC environment, I
> expected the CPU to be a sizable percentage of waits/activity.
>
> Our system runs fine except during our bi-weekly crunch. The waits are
> heavily CPU biased, but the total waits are low. Cluster is an almost
> invisible line on the GRID graph. Our ASH reports look fine.

Usually, waits are not CPU biased. Waits are waits. At a risk of being scolded for telling corny jokes, all computers wait at the same speed.

http://www.cafepress.com/lovegeek.46858304

>
> During the crunch time, the users report long delays for report requests
> and row updates.
> Ash reports the Top DB Objects as follows: Index - "gc current block
> 2-way" activity 16.3% event 16% Index - "db file sequential read"
> activity 5.22% event 5% Table - "db file sequential read" activity
> 2.58% event 2.5% Table - "db file sequential read" activity 1.3%
> event .77% Index - "db file sequential read" activity 1.2% event
> .98%

OK. There are some things that you need to know. First, what is a current block? In short, that is the only version of the block that one can modify.Any block the database can have gazillion of consistent read versions (CR) and previous image versions but it can have only one current version. It's like the Highlander: there can be only one. Each resource (and blocks are resources, among other things) has a particular instance as a master. If somebody wants to acquire the current version of a block, it must tell that to the master of the block. If another instance holds the current version of the block, it, too, needs to know about the attempt. Thus, "gc current - two ways". In other words, your waits pretty much signify two things:

Normal operation of your RAC system
Sadly inadequate application system written without having RAC in mind.

The second point needs an explanation. In RAC systems global locks are much more expensive then a single instance local locks. Global locks include calls to the network while the local locks are done, well, locally, using IPC. Global locks can take centiseconds to acquire, while the local locks take microseconds. Unfortunately, every transaction needs to lock the underlying row(s) and, in order to do that, it has to obtain the current version of the block(s) containing them. Now, if a table is updated from all nodes simultaneously, you will essentially be playing ping-pong with global locks. Each lock will take centiseconds to acquire. The consequences can be an unmitigated disaster. That happens when someone puts an application system centered around few tables that are very intensively updated/inserted into by many concurrent user on a RAC system, possibly with Linux, and expects it to "scale". That sort of thing gets to be particularly interesting if the underlying tables are created without sufficient number of free lists, in a non-ASSM tablespace. If blocks are densely packed with rows then all the users will constantly engage in acquiring and locking blocks. The only solution is known as "functional partitioning". In short, in a functionally partitioned system, related resources are accessed from the same node. If two users modify the same table, they connect to the same node. Usually, assigning one of the nodes as a batch node and moving all batch jobs that node is a good start. It's usually easier to do then it sounds. It usually takes some trickery with the tnsnames.ora and a careful study of who uses what.

I mentioned Linux because Linux usually runs on PC-like equipment which doesn't have sophisticated bus architectures adjusted for heavy SMP, like SUN, HP or IBM minicomputers. In addition to that, SCSI layer is an emulation on Linux, which means that an additional interrupt is needed for a SCSI-like calls (even when there is no actual SCSI involved, like with the DVD burners). Network adapters (private & public) are on the same bus as FC/AL disk adapters, memory and the CPUs. That, of course, introduces additional waits and increases the number of waits. Linux RAC running on Dell machines, in the other words, is not quite the same as a proper mini computer running HP-UX or AIX. When buying RAC, one should also consider expenses of a top-notch DBA and a top-notch system administrator. Frequently, people will discover that a large mini-computer with a good storage and proper support is cheaper then Linux RAC, with all the downtime needed in the much longer introductory phase and with the needed expensive professionals.

>
> When I monitor the system via GRID, I see that CPU is about 60% of the
> activity. Cluster is about 25%. We see "gc current block 2-way" a good
> bit.
>
> One of our Top SQL Statements is a select using 37% of activity. It has
> event percents = 20% "CPU and Wait for CPU" and 16% "gc current block
> 2-way".
>
> Thanks in advance

When a process waits, it's not using CPU. Conversely, when a process is using CPU, it's not waiting. It's running. Your db sequential reads signify intense use of indexes. Indexes can be, and frequently are, badly misused, especially for reporting purposes. Nested loop and index merge (AND EQUAL) can be rather gruesome. Also, using optimizer_index parameters can sometimes result in an "index full table scan". That is NOT an official access method, that is a situation in which, because single block reads are so much cheaper then the multi-block reads, all rows in a table are read through an index, key by key. The only way is to look into the statement and see how can you improve upon it.

-- 
http://www.mladen-gogala.com

Received on Wed Jun 06 2007 - 22:59:42 CDT