RE: Oracle Performance on Sunfire T2000

From: Tanel Poder <>
Date: Sat, 14 Feb 2009 16:56:47 -0700
Message-ID: <7679881BFD67476384202BE0FC0FB058_at_porgand>

Hi Martin,  

It's about how long latches are held. On a T2 latches are held longer as it takes more cpu cycles to complete the work which needs to be done under protection of a latch. This is so even if there is enough CPU capacity available in the system (no waiting for CPU needed).  

Yes, the CMT processors scale better, in other words the RELATIVE performance drops less if you go from 1 to 128 parallel threads. But this is RELATIVE performance, not the real performance (the number of instructions executed or the number of business transactions completed).  

You may need few hundred parallel threads to get X transactions per minute on a T2, but you may only need 16 parallel threads on an Opteron/Xeon. That's why the CMT's aren't advertised as performance monsters, but as giving you scalability (rather than raw performance) and good power/cooling footprint...  

Btw you can measure roughly how long latches are held using my LatchProfX script (written in plain SQL :) or with DTrace by tracing pid$target:oracle:kslgetl:entry and return probes.  

One more factor is the cache coherency architecture and whether all your threads are running on a single CPU core (or chip) and whether the L2/L3 cache is per core or for all cores in a socket. If the cache line which holds the latch structure is currently owned/cached by a different CPU (different socket) then the latch getter needs to snoop the other CPU cache to see what's the latch value right now. At some architectures the snooping is done by sending a request to memory controller which goes through memory bus at memory bus base clock rate (which is slower than cpu clock), but in some (like AMD Opteron) the snooping is done at HyperTransport clock rate which is faster.    


From: Martin Berger [] Sent: 13 February 2009 13:34
Cc:;; Subject: Re: Oracle Performance on Sunfire T2000

it's just about the runqueue (I guess). If the runqueue in your 4 fast CPUs is 'long', you will be happy any of the 'slow' 128 Threads process the task and release the latch.
Of course, if you do not utilize 4 CPUs to the limits, you will not need 128 Threads at all.

But still I'm just telling in pure theory, in Summer I will have my new T2+s and have to prove it. Until then, it's pure theory.



Martin Berger

There's one more catch with slow single thread execution with high parallelism in Oracle. If you migrate from 4 fast CPUs to 128 slow threads, you will have much heavier latch contention on busy latches. Doing whatever work under protection of a latch will probably take longer, thus the latch is held for longer. And instead of 3-4 concurrent threads trying to get the latch at the same time you'll potentially have few hundred ones....  

Glenn Fawcett has quite a few useful blog entries about Oracle performance on Sun CMT processors  


From: [] On Behalf Of Matthew Zito
Sent: 12 February 2009 19:02
To:; Subject: RE: Oracle Performance on Sunfire T2000

We have a couple of t1000s, and while our workload is a little odd (we're an automation company, so all our several hundred databases do is get installed, patched, upgraded, uninstalled, etc.), anything involving data dictionary activities (running catupgd.sql, etc. - high-cpu single threaded activities) is slower on the t1000s than our ancient v210s.

Supposedly the t1000/2000 are perfect for J2EE apps - lots of threads, not a lot of heavy-lifting, parallelization of execution is the most critical piece.



Matthew Zito
Chief Scientist
GridApp Systems
P: 646-452-4090

-- Received on Sat Feb 14 2009 - 17:56:47 CST

Original text of this message