Re: cache buffer chains/where in code

From: Martin Berger <martin.a.berger_at_gmail.com>
Date: Sat, 28 Nov 2009 20:21:58 +0100
Message-Id: <35996DA0-8B34-418A-9EB9-3E2D525FBC72_at_gmail.com>


It's verry interresting to follow this thread, as it seems there are 2 discussions:
Christo wants to know why this particular latch (and only this one) causes problems at big load (maybe even not at such big system-load, this is not clear to me right now);
but Greg warns to consider the special 'features' of a CMT system regarding general CPU-queues in a OLTP system. To hunt Crhistos hot latch, I'd follow Tanels suggestions (LatchprofX) and more or less ordinary tuning methods. I total agree with Gregs considerations, I just like to see valid numbers (response times in relation to any of the possible 'processors') - this might enable us to pinpoint gregs 65% to dedicated numbers (processes, transactions, whatever). If I'm totally wrong, please tell me; otherwise I'll follow this conversations (both) with high interrest!

Am 28.11.2009 um 18:05 schrieb Greg Rahn:

> Given that config, I'd say that system is has at over 4X the amount of
> db connections it probably should (and needs to work well) - I'd back
> it down to 64 as a staring point and make sure the connection pool
> does not grow. Set the initial and max connections to be the same
> number. One might think that you need more sessions to keep the CPUs
> busy (and you may need more than 1 per CPU thread) but the reality is
> this: With a high number of sessions, the queue is longer for
> everything. The chance of getting scheduled when it needs to goes
> down and if there is fairly steady and a medium to high load, any
> "bip" will cause a massive queue for a resource. Consider what
> happens when calls are taking milliseconds and for a split second,
> some session holds a shared resource - it may take the system tens of
> minutes to recover from that backlog. This is why most high
> throughput OLTP systems only want to run at a max of 65% (or so) CPU
> utilization with very short run queues - so that if there is any slow
> down, there is enough resource head room to recover. Otherwise the
> system will likely be in a unrecoverable flat spin at Mach 5.
>
> On Sat, Nov 28, 2009 at 12:13 AM, Christo Kutrovsky
> <kutrovsky.oracle_at_gmail.com> wrote:
>> Greg,
>>
>> It's a single UltraSparc T2 CPU, which is 8 cores, 8 threads. Note
>> that each
>> core has 2 integer pipelines. So you could assume 16 CPUs and 64
>> threads.
>>
>> There are many things that are wrong with this setup, and reducing
>> the
>> number of connections is something I am considering. However it's
>> not that
>> simple. Imagine that instead of CPU those were doing IO. You want
>> to have a
>> relatively deep IO queue to allow the raid array to deliver.
>>
>> One thing that puzzles me is given that the suspicion is deep cpu
>> run queue
>> is problems, why only one very specific latch is causing the
>> problem. There
>> are several different types of queries running at the same time,
>> why only
>> one specific query is causing latch contention, why not the other
>> ones.
>
> --
> Regards,
> Greg Rahn
> http://structureddata.org
> --
> http://www.freelists.org/webpage/oracle-l
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sat Nov 28 2009 - 13:21:58 CST

Original text of this message