Re: CPU Usage % on 2-Node RAC Run versus Single Node RAC Run .... Benchmarking

From: Greg Rahn <greg_at_structureddata.org>
Date: Tue, 25 Sep 2007 12:32:48 -0700
Message-ID: <a9c093440709251232u49a256fei179e94aafd28bba0@mail.gmail.com>

Theory:

The workload probably has nearly 100% of the data in cache and thus is CPU bound - little to no IO is taking place. The 2 node RAC config probably has 50% of the data in each cache. The "additional" CPU from the sum of both nodes is due to the remote buffer get calls (extra function calls are not free). Again, this is a symptom of an in-memory database and probably would not be the case in a real-world scenario. If there was physical IO taking place, it would be a closer number. Why? Physical IO is an order of magnatude slower than remote buffer calls and several orders of magnitude slower than local gets. The physical IO times would dominate the overall transaction time simply because of scale.

For demonstration lets play with some numbers. Lets first declare some constants:

- local buffer get takes 1 microsecond  (0.000001)
- remote buffer get takes 1 millisecond (0.001)
- physical IO takes 10 milliseconds     (0.01)

Lets say our workload has to do 1,000,000 buffer gets.

If 100% are local buffer reads:
1,000,000 gets * 0.000001 = 1 second

If 50% local buffer gets, 50% are remote buffer gets,: (500,000 * 0.000001) + (500,000 * 0.001) = 0.5 + 500 = 500.5 seconds

Lets also consider if a remote buffer get takes 0.0001 seconds (500,000 * 0.000001) + (500,000 * 0.0001) = 0.5 + 50 = 50.5 seconds

Depending on the remote buffer get times, this in-memory transaction could get 50-500x slower if 50% of its buffer gets are remote gets.

Are remote buffer gets a bad thing? Lets see.

Lets introduce some physical IO now. Lets say 95% of the data is in local memory, 5% physical IO.
((.95 * 1,000,000) * 0.000001) + (.10 * 1,000,000) * 0.01) = 509.5 seconds

If we compare the 95% local, 5% physical case with the 50/50 local/remote (1 millisecond) we see that they take approximate the same time (509.5 seconds vs. 500.5 seconds). With the given constants, we see that if 100% of the data spread across the RAC cluster, it would be (slightly) faster to do the remote buffer gets than to have 5% physical IO with 95% local buffer gets.

Of course, there are an unlimited number of use cases here, one could also have local gets, remote gets and physical IO, access times could vary slightly, but I hope that the numbers help paint the picture.

Bottom line: the slowest call will dominate the overall transaction time when there are one to several orders of magnitude differences between the call durations.

On 9/24/07, VIVEK_SHARMA <VIVEK_SHARMA_at_infosys.com> wrote:
> CASE 1 - When Executing a FULL Set of Transactions on Node 1, with the 2nd Node's RAC instance in SHUTDOWN Condition
>
> CPU Usage of Node 1 = 20 %
> CASE 2 - When Executing approx Half the above Number of Transactions on Node 1, & the Other Half on Node 2 (by setting LOAD_BALANCE = yes in tnsnames.ora)
>
> CPU Usage of Node 1 = 18 %
>
> CPU Usage of Node 2 = 19 %

-- 
Regards,

Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l

Received on Tue Sep 25 2007 - 14:32:48 CDT