multiple applications in RAC
Date: Tue, 2 Jun 2009 12:05:55 -0400
From: Mark W. Farnham [mailto:mwf_at_rsiz.com]
Sent: Tuesday, June 02, 2009 10:29 AM
Subject: FW: RAC application mix
snipped to fit
From: Mark W. Farnham [mailto:mwf_at_rsiz.com]
Sent: Tuesday, June 02, 2009 10:07 AM
Without reading in on the business realities of your situation, advising which of many available paths to pursue is unlikely to give you your best option.
Probably the first item on the list should be to determine whether internodal traffic is the actual performance problem. It is entirely possible from what you have related to us that the sheer load from the additional applications is simply too much for the servers and you are waiting for cpu or i/o bandwidth. Since you have stressed the internodal aspects, you probably have some reasons why you suspect the "RAC TAX" to be your problem. To decide what to fix, though, you need to know the relative amount of delay being caused by internodal delays as compared to the total time to execute. Another way to state that is "If you reduced internodal costs to zero, would the resulting performance be adequate?" The answer to that question gives you a good idea whether you can get a good enough result just by reducing internodal delays as much as is reasonably possible. Another important question is "If you reduced the internodal costs to zero, would the application users even notice?" If the answer to the second question is no, you probably shouldn't even work on the internodal costs, while if the answer to the first question is "yes," you can make a good case that it is wasted effort to measure the other components of the transaction times until you've at least harvested the low hanging fruit for reducing internodal costs. (Of course, please remember that the first question is a little bit of a trick question in some cases. If internodal delays are your worst bottleneck to completing a transaction, and the next bottleneck is currently contributing little time to transactions but has only a little headroom left operating near its current service level, you may find you have a new service to improve after you improve the first one. That may be frustrating, but it is still always best to address the worst bottleneck first, because that delivers steady incremental improvement in performance, while improving the service of something that is not your worst bottleneck can actually degrade overall performance by exacerbating the behavior of the worst bottleneck.
Now, I'll proceed as if you've answered the first question as "yes," and that no other bottlenecks are even close seconds.
Understanding the stated purpose of using RAC for the site should be an early item on the list. If the purpose is mere marketing clutter guiding the purchase, your best solution might be to decouple the RAC. If the purpose is a reasoned policy, then that is probably not an option. Let's assume that and proceed.
Knowing the relative loads of the about 5 (one original and about 4 more) applications, both at peak load and in average, is helpful, as well as knowing the relative peak/valley load schedules of the various applications. A key is whether at any point in time during your processing schedule the load can be divided such that some arrangement of the applications on the nodes could keep all nodes operating in the "sweet spot." (That is, below the knee in the load/service time hockey stick graph). If, say, applications 1 and 2 on node A, and applications 3, 4, 5 on node B would always work, then you should be able to largely dismiss the internodal overhead by using services to direct the various applications to a particular node at all times except failure modes. (And in a 2 node RAC you don't have internodal traffic on the interconnect in failure mode. In higher node counts planning which applications should fail where might be interesting.)
Often establishing "application affinity" with service names or whatever mechanism works for a particular vendor supplied application is sufficient. That is similar to the suggestion of your "logging" vendor.
Now if you have one or more applications that truly need the horsepower of more than one node from time to time (or all the time), then we've come to addressing your original question. (Well, actually if that one particular application "fits" on one node, just route it to one node. In fact if you don't have time to answer all the questions leading up to this point, that is a pretty easy and very reversible test, and it might be the cheapest way to evaluate whether putting the one particulary ill behaved application on one node is a solution.)
But your original question I take to be: How do we maximize the throughput of several thousand users updating a single table from multiple nodes? (I'm not at all sure why you then ask about INITRANS.)
One big divide at this point is whether you are using ASSM or freelists. If you are using freelists, you need to make sure you have at least as many freelist groups as you plan to run nodes. ASSM should keep track of assigning fresh inserts to a block already on the current node or (not on any node - meaning not in the buffer cache of a particular instance yet) dynamically. Freelist groups does the same thing more or less statically, so if you are really sensitive to unreachable free space, that might be a problem for you if fresh inserts are mostly from one node, since the space allocated to the "other" freelist group probably won't get used unless you're running out of space overall.
So to recap - If you're using ASSM there is nothing to tune, if you're using freelists, make sure you have enough freelist groups.
Now that will get you to a pretty good place relative to the possibilities regarding fresh inserts. But you asked about updates. Now if it is very likely that the update is to a row in the current session, then all the rows in that block are probably for sessions in the current instance (ie. the instance running on your current node), so you still probably do not have a problem.
But if you get node A for a session, insert a fresh logging row, update that row, etc., then you log out and when you come back you get node B, then you might have a problem, because other users might have inserted a row in the same block as you and they are either still on A or got A again when they logged back in. The only general solution I am aware of to that particular problem is to minimize rows per block in the logging table to 1. I call that hermitization. In brief, you rename the logging table. Make a matching table containing one row. Minimize rows per block. Copy in the rest of the rows. Build the indexes.
mwfReceived on Tue Jun 02 2009 - 11:05:55 CDT