Re: AWR Sample Report
Date: Fri, 21 Nov 2008 07:52:36 -0800 (PST)
Message-ID: <0943823c-b3cc-4913-a170-826e6561ec81@x8g2000yqk.googlegroups.com>
On Nov 21, 9:28 am, raja <dextersu..._at_gmail.com> wrote:
> Hi,
>
> I will try to get a AWR Report which has an interval of 10-15 minutes
> interval.
> Currently i have only this report and i am trying to analyze it.
>
> In AWR Report, SQL Statistics, there are many parts like :
> SQL ordered by Elapsed Time
> SQL ordered by CPU Time
> SQL ordered by Gets
> SQL ordered by Reads
> SQL ordered by Executions
> SQL ordered by Parse Calls
> SQL ordered by Sharable Memory
> SQL ordered by Version Count
> Which i should look into first ? why ?
>
> With Regards,
> Raja.
On first examination, I see:
Buffer Hit %: 99.89
One might say, that is a near perfect buffer cache hit ratio, move on to the next report. But I have an example which shows a 100% buffer cache hit ratio and the performance of the database
instance was terrible, so maybe we should permanently ignore this statistic.
Watch for line wrapping (word wrapping) in the following:
Just a couple statistics to frame the rest of the report: Elapsed: 359.46 (mins) (21,567.6 seconds) DB Time: 1,723.72 (mins) {DB Time includes CPU time and wait events for all sessions}
One approach to looking at the AWR/Statspack reports might be to start
with the CPU usage and the wait events, and see if any further
investigation is needed.
NUM_CPUS 14
NUM_CPU_CORES 7
Total CPU Time Possible: 14 * 359.46 * 60 seconds = 301,946.4 seconds
- note that it is unlikely that processes will jump from one CPU to
another CPU, although if parallel processing is in
use, more than one CPU might be used to retrieve the data for SQL statements. It is quite possible that one CPU might become nearly 100% busy, yet the average CPU busy calculation might
might indicate that the CPUs are only 7% busy.
CPU time: 56,931 seconds
Percentage of total possible CPU Time that was consumed by Oracle: 56,931/301,946 = 18.9%
The above is not conclusive, digging deeper...
What is happening at the system level:
OS Statistics
Statistic Total AVG_BUSY_TIME 773,431/100= 7,734.31s (total divided by # CPUs) AVG_IDLE_TIME 1,382,735/100= 13,827.35s AVG_IOWAIT_TIME 198,722/100= 1,987.22s AVG_SYS_TIME 112,958/100= 1,129.58s AVG_USER_TIME 659,771/100= 6,597.71sOS_CPU_WAIT_TIME 11,375,200/100=113,752.00s
OS_CPU_WAIT_TIME is defined as: Total number of seconds that processes have been in a ready state, waiting to be selected by the operating system scheduler to run. This seems to indicate
that there was significant waiting for CPU time slices (competition for the same CPU), even though the average CPU utilization at the operating system level is 35.9%
[7,734.31/(7,734.31+13,827.35)]. Several processes are apparently fighting for time on the same CPU.
You could look at the Time Model Statistics, but they may not be telling much other than there may be a lot of context switches, and more than half of the total SQL execution time was spent
on the CPUs (rather than waiting for disk or other wait events):
Statistic Name Time (s) % of DB Time sql execute elapsed time 102,885.27 99.48 DB CPU 56,931.50 55.05 Java execution elapsed time 18,173.85 17.57 PL/SQL execution elapsed time 1,146.88 1.11 ... DB time 103,423.35 background elapsed time 9,817.43 background cpu time 1,169.73
The hierarchy of the above Time Model Statistics should be read like this:
1) background elapsed time 9,817.43 2) background cpu time 1,169.73 (of the 9,817.43) 1) DB time 103,423.35 2) DB CPU 56,931.50 (of the 103,423.35) 2) connection management call elapsed time 2) sequence load elapsed time
2) sql execute elapsed time
2) parse time elapsed
3) hard parse elapsed time 4) hard parse (sharing criteria) elapsed time 5) hard parse (bind mismatch) elapsed time 3) failed parse elapsed time 4) failed parse (out of shared memory) elapsed time 2) PL/SQL execution elapsed time 1,146.88 2) inbound PL/SQL rpc elapsed time 2) PL/SQL compilation elapsed time 2) Java execution elapsed time 18,173.85 (of the 103,423.35)
Let's switch our attention to the wait events, but first a couple statistics to provide a frame of reference:
SQL*Net roundtrips to/from client 4,813,090 SQL*Net roundtrips to/from dblink 43,646 user calls 4,836,293 user commits 24,259 user rollbacks 20 Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn db file sequential read 4,036,055 0.00 13,310 3 166.24 db file parallel write 567,009 0.00 6,515 11 23.35 db file scattered read 934,341 0.00 1,383 1 38.48 log file parallel write 38,621 0.00 1,019 26 1.59 Backup: sbtwrite2 51,438 0.00 896 17 2.12 Log archive I/O 15,175 0.00 658 43 0.63 log file sync 17,706 0.10 387 22 0.73 direct path write temp 98,515 0.00 313 3 4.06 SQL*Net more data to client 4,642,420 0.00 154 0 191.21 enq: TX - row lock contention 1,475 3.19 138 94 0.06 control file parallel write 10,484 0.00 107 10 0.43 log file sequential read 12,839 0.00 87 7 0.53 read by other session 40,059 0.00 79 2 1.65 direct path read 272,647 0.00 54 0 11.23 log buffer space 181 0.00 46 256 0.01 enq: WL - contention 16 87.50 44 2742 0.00 latch: cache buffers chains 363,571 0.00 37 0 14.97 control file single write 448 0.00 28 63 0.02 os thread startup 424 0.00 23 55 0.02 RMAN backup & recovery I/O 6,130 0.00 18 3 0.25 control file sequential read 955,614 0.00 9 0 39.36 log file switch completion 75 0.00 7 89 0.00 SQL*Net more data from dblink 127,162 0.00 6 0 5.24 cursor: pin S wait on X 365 100.00 4 10 0.02 SQL*Net message to client 4,819,165 0.00 4 0 198.49
...
According to the number of waits for the SQL*Net events, on average, just about every request for data from the server requires two sends as the data returned is too large to fit into the SDU
(defaults to 2,000 bytes) - this is based on comparing the value of "SQL*Net message to client" to the value of "SQL*Net more data to client". This indicates that network latency, if
significant, could have a significant impact on apparent database performance.
There are timeouts reported for some of the wait events - this may be a critical indicator. If the timeout for a paricular wait event happens to be 3 seconds, and 88% of the waits for an
event happen to result in time outs, that means on average 88% of the time the event happens, it will last 3 seconds in duration. However, the wait count increases by 1 every time a time-out
occurs, thus it might be a single session waiting non-stop for 42 seconds (16 * 87.50 * 3 seconds) = 42 seconds.
"db file sequential read" indicates that Oracle is reading one block from disk - this usually indicates that Oracle is reading from an index, but might also happen rarely during a tablescan
(if the other blocks needed are already in the buffer cache), or when reading the undo blocks to provide a consistent read. "db file scattered read" indicates that Oracle is reading more
than one block at a time from disk - this usually indicates that Oracle is reading from a table during a full table scan.
"db file parallel write" indicates waits related to the DBWR process, and may be related to the number of DBWR processes or DBWR I/O slaves. May also indicate slow or high contention devices. I believe that I previously commented on the value of the parameter db_writer_processes, and suggested that you take a look at an article on Kevin Closson's website related to what happens when that parameter is set to a high value - that article talks about the impact on CPU queuing, which seems to be a problem in this AWR report.
"log file parallel write" indicates Writing redo records to the redo log files from the log buffer. This usually indicates slow devices or contention where the online redo logs are located.
Take a look at the definitions for the other wait events.
Use the CPU utilization and wait events to determine what other sections of the AWR report need to be examined. Post the results of your findings.
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Received on Fri Nov 21 2008 - 09:52:36 CST