Re: Memory operations on Sun/Oracle M class servers vs T class servers

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Tue, 16 Dec 2014 22:49:22 -0400
Message-ID: <CAMHX9JLrQ9kfYrxB+O8bQgDRYfgWLdOU=oNghz2m0SY5yDVLmg_at_mail.gmail.com>



The CPU die has limited "real estate" for stuff.

Each CPU core takes space (and cache too). The more complex and sophisticated the CPU core microarchitecture is, the more space it takes.

The design tradeoff between the classic CPU architecture and T-series CMT architecture was to reduce the CPU core complexity so you'd be able to put more cores on the chip.

Less sophisticated microarchitecture means there's less (or no) prefetching, pipelining, out-of-order execution, branch prediction and instruction level parallelism going on inside the CPU - so you just spend more CPU cycles per instruction (stalling for memory access and other stuff) when executing code. That's why the single threaded performance sucked. This was somewhat compensated by having many sets of registers (virtual CPU threads) built in to the same core - so when one thread was waiting (stalled), then another thread's instructions could be scheduled on the same core's execution units. With lots of threads you can get decent performance out of the modern T-series CPUs. Note that vmstat and OS-kernel level tools are useless for CPU capacity planning on these platforms as how much a single thread gets done depends on how utilized the core itself is. Corestat is the utility for measuring the actual core "busyness" on Solaris.

Of course the issue may be totally somewhere else (a'la parameter, environment differences, bugs etc etc or number of DIMM slots you have actually filled in the server - more is better :)

Solaris tools cputrack and cpustat (available on SPARC since Solaris 8!) or DTrace's CPC counters (Solaris 11) would allow you to drill down into the wonderful world of CPU performance counters that help to break down where your CPU cycles get used or wasted. Especially relevant in the in-memory days.

Tanel.

On Tue, Dec 16, 2014 at 6:47 PM, Mark Burgess < mark_at_burgess-consulting.com.au> wrote:
>
> Finn,
>
> I have been seeking answers to the same types of questions for a customer
> site for the past 3 years on T4 and T3 hardware. Others will be able to
> offer a far more scientific explanation as to why but in a nutshell the T
> series platform seems to be good for doing lots of things concurrently as
> opposed to one thing particularly fast. The classic response time v
> throughput discussion. The types of problems you describe below are exactly
> the types of issues I have encountered - single threaded
> processes…ie..SQL/PLSQL take longer to run than what you would expect.
> Unfortunately I have not been in a position to be able to perform a like
> for like comparison against another platform to provide some science behind
> the analysis. I have used parallel query selectively to resolve single
> threaded performance issues however I do not see this as being a viable
> approach to work around all the performance constraints on this platform.
>
> I have been looking to setup SLOB to compare T4, X4-2L and Exa X4-2
> timings however this is still on the to-do list as I believe this is the
> only way to provide a comparative measure to compare T series against other
> platforms.
>
> Regards,
>
> Mark
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Dec 17 2014 - 03:49:22 CET

Original text of this message