Re: CPU Usage in running process on HP-UX 11.31

From: Stefan Knecht <knecht.stefan_at_gmail.com>
Date: Tue, 8 Sep 2009 10:48:21 +0200
Message-ID: <486b2b610909080148p39fbbf51u3c2b2f5d30ce523e_at_mail.gmail.com>



Ciao Martin

On Tue, Sep 8, 2009 at 10:17 AM, Martin Berger <martin.a.berger_at_gmail.com>wrote:

> Hi Stefan,
> so I'm prety sure you know these 2 ML-notes:
> Subject: *Multi-Threaded Server (MTS) Trace Events* Doc ID<https://metalink2.oracle.com/help/usaeng/Search/search.html#file>:
> *106624.1*
> *Subject: MTS: How to Debug a Hanging Multi-Threaded Server Connection Doc
> ID <https://metalink2.oracle.com/help/usaeng/Search/search.html#file>:
> 106621.1*
>
>

Yep, I'm aware of those. Unfortunately they don't ehlp in this case.

> If a tusc trace did not give any hints, the next step would be attaching a
> debugger to the process. But this is of realy limited use; even if you know
> which CPU-commands it's using all the time, of which loops are spinning, you
> can hardly find the reason without the source-code and debug-hooks prepared
> in the binary.
>
> Yes, problem is that even tusc slows down the process quite massively. And
since we only experience the issue in production, we can't really use any even more intrusive tools. I was merely hoping HP-UX provides some new hooks in the release 3 kernel that are not yet widely known, that allow for measurign those things (analogous to i.e. dtrace on sun).

Just some tries to get a lucky hit:
> As users has not changed, has anything else changed?
>

Nope, not that I'm aware of, nor did the apps guys claim any changes

> OS-patched?
>

Nope

> network subsystem? (new switches, now network card new drivers, new team?)
>

Nope

> io-subsystem?
>

Some new disks were added to ASM. However, we don't experience any storage issues.

> clients?
>

Nope

However, for completeness sake, we've had an issue where the dispatchers lost network packets under very high load. This was fixed by a patch by Oracle, as it virtually killed the database. Now we're running "stable" but clogging up WAY to much CPU.

> (my qhestions are going to 'maybe it's not worker<->dispatcher but
> client<-> dispatcher which causes the higher load?)
>
> Possibly. But running the same "client" and the same workload with
dedicated servers, we don't experience any issues with CPU usage. If at all, I'd expect the shared servers to use up CPU (as they're doing the majority of the work) -- not the dispatchers,. whose sole task it is to direct clients to a free shared server, and send the results back to them. Also, 99% of all queries are small, returning only a few rows.

Thanks for the feedback though :-)

> liebe Grüße,
> Martin
>
> Am 08.09.2009 um 10:07 schrieb Stefan Knecht:
>
> Hi Martin
>
> Yes, we've traced them for hours and submitted all that to Oracle as well.
> Apparently it didnt' show anything.
>
> Problem is that you can't tell how much time is really spent on the CPU
> based on that.
>
> Stefan
>
> =========================
>
> Stefan P Knecht
> CEO & Founder
> s_at_10046.ch
>
> 10046 Consulting GmbH
> Schwarzackerstrasse 29
> CH-8304 Wallisellen
> Switzerland
>
> Phone +41-(0)8400-10046
> Cell +41 (0) 79 571 36 27
> info_at_10046.ch
> http://www.10046.ch
>
> =========================
>
>
> On Tue, Sep 8, 2009 at 10:01 AM, Martin Berger <martin.a.berger_at_gmail.com>wrote:
>
>> Hi Stefan,
>> have you ever tried to use tusc to trace any of these processes?
>> even it will not tell you what it's doing on CPU, but only which system
>> calls it fires, this often can give a hint.
>>
>> hth
>> Martin
>>
>>
>> Am 08.09.2009 um 09:36 schrieb Stefan Knecht:
>>
>> Hi folks
>>
>> We have an issue on a database box where we have approx 1500 users
>> connected to a database set up with shared servers (due to memory shortage).
>> Now we have a situation where the dispatchers (there's 8 of them, more than
>> enough for 1500 users, their busy state is less than 10% permanently)
>> consume a LOT of CPU. They're always the top consumers. (Load average has
>> gone up to a perm 3 (with almost constantly being at 90% and above )
>> compared to a comfy 50% average before). Users have not changed.
>>
>> SR has been going with Oracle for quite some time. Everything has been
>> looked at and confirmed to be set up properly. Now their "senior analysts"
>> don't know what else to do, and are filing a bug.
>>
>> Now I'd like to know if:
>>
>> - Do you know of a tool for HP-UX that will show me CPU usage in a running
>> process per SYSTEM CALL (our unix guys don't)?
>> - Is there a certain event to enable for dispatchers that will yield
>> useful trace data to diagnose such an issue ?
>> - Anything else that pops into your mind ?
>>
>> Cheers
>>
>> Stefan
>>
>>
>> =========================
>>
>> Stefan P Knecht
>> CEO & Founder
>> s_at_10046.ch
>>
>> 10046 Consulting GmbH
>> Schwarzackerstrasse 29
>> CH-8304 Wallisellen
>> Switzerland
>>
>> Phone +41-(0)8400-10046
>> Cell +41 (0) 79 571 36 27
>> info_at_10046.ch
>> http://www.10046.ch
>>
>> =========================
>>
>>
>>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 08 2009 - 03:48:21 CDT

Original text of this message