Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> Re: *Measuring sql performance (elapsed time and scalability) by number of logical reads

Re: *Measuring sql performance (elapsed time and scalability) by number of logical reads

From: Tanel Põder <>
Date: Tue, 9 May 2006 23:34:22 +0800
Message-ID: <09ad01c6737e$12830dd0$3f121eac@porgand>

> I think that instrumenting oracle kernel code path is as useful as OS
> call code path instrumenting is.

Yep, and Oracle has slowly started doing it in 10g with the v$sys_time_model, v$sess_time_model views and their underlying infrastructure.

Alternatively binary profiling could be used for getting a profile in which kernel functions most of the time is spent.

Also, with modern CPUs, it's not enough just to distinguish between wait time and CPU time, because in reality our process might not be crunching numbers at all while we think it is.

Few examples would be when a process is in CPU runqueue, actually starving for CPU resources. Others would be servicing virtual memory traps/soft page faults, which are done in kernel mode but accounted to user process time as far as I know. That's not the end of the list unfortunately, lots of the time our CPUs are just waiting for the main memory anyway.

So in order to really be able to know where the time is going, some tools are required which are able to set/read CPU performance statistic counters as well. Solaris is quite advanced in this, they have cpustat, trapstat, lockstat utilities and dtrace which all can fiddle with CPU statistic counters to some extent.

On Linux, you've got to hack something together yourself with RDMSR, RDPMC instructions and their relatives or use 3rd party tools like Intel's VTune to get an execution profile with CPU stall statistics of a running binary.

If anybody bothered to read until the end - there is little we can do to fix such memory latency related problems, but nevertheless, there are some. First, using Oracle's buffer cache and direct IO instead of OS buffer cache to reduce system calls. Also using large memory pages and SGA locking into memory where available, to reduce the number of VM mapping entries, sharing them among processes and perhaps reducing number of levels in OS kernels VM mapping index structure - thus improving overall CPU cache efficiency.

Kevin, did I get the last part right? ;)


Received on Tue May 09 2006 - 10:34:22 CDT

Original text of this message