Re: AWR Sample Report

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Fri, 21 Nov 2008 07:52:36 -0800 (PST)
Message-ID: <0943823c-b3cc-4913-a170-826e6561ec81@x8g2000yqk.googlegroups.com>


On Nov 21, 9:28 am, raja <dextersu..._at_gmail.com> wrote:
> Hi,
>
> I will try to get a AWR Report which has an interval of 10-15 minutes
> interval.
> Currently i have only this report and i am trying to analyze it.
>
> In AWR Report, SQL Statistics, there are many parts like :
> SQL ordered by Elapsed Time
> SQL ordered by CPU Time
> SQL ordered by Gets
> SQL ordered by Reads
> SQL ordered by Executions
> SQL ordered by Parse Calls
> SQL ordered by Sharable Memory
> SQL ordered by Version Count
> Which i should look into first ? why ?
>
> With Regards,
> Raja.

On first examination, I see:
 Buffer Hit %: 99.89

One might say, that is a near perfect buffer cache hit ratio, move on to the next report. But I have an example which shows a 100% buffer cache hit ratio and the performance of the database

instance was terrible, so maybe we should permanently ignore this statistic.

Watch for line wrapping (word wrapping) in the following:

Just a couple statistics to frame the rest of the report: Elapsed: 359.46 (mins) (21,567.6 seconds) DB Time: 1,723.72 (mins) {DB Time includes CPU time and wait events for all sessions}

One approach to looking at the AWR/Statspack reports might be to start with the CPU usage and the wait events, and see if any further investigation is needed.
NUM_CPUS 14
NUM_CPU_CORES 7 Total CPU Time Possible: 14 * 359.46 * 60 seconds = 301,946.4 seconds - note that it is unlikely that processes will jump from one CPU to another CPU, although if parallel processing is in

use, more than one CPU might be used to retrieve the data for SQL statements. It is quite possible that one CPU might become nearly 100% busy, yet the average CPU busy calculation might

might indicate that the CPUs are only 7% busy.

CPU time: 56,931 seconds

Percentage of total possible CPU Time that was consumed by Oracle: 56,931/301,946 = 18.9%

The above is not conclusive, digging deeper...

What is happening at the system level:
OS Statistics

Statistic             Total
AVG_BUSY_TIME       773,431/100=  7,734.31s (total divided by # CPUs)
AVG_IDLE_TIME     1,382,735/100= 13,827.35s
AVG_IOWAIT_TIME     198,722/100=  1,987.22s
AVG_SYS_TIME        112,958/100=  1,129.58s
AVG_USER_TIME       659,771/100=  6,597.71s
OS_CPU_WAIT_TIME 11,375,200/100=113,752.00s

OS_CPU_WAIT_TIME is defined as: Total number of seconds that processes have been in a ready state, waiting to be selected by the operating system scheduler to run. This seems to indicate

that there was significant waiting for CPU time slices (competition for the same CPU), even though the average CPU utilization at the operating system level is 35.9%

[7,734.31/(7,734.31+13,827.35)]. Several processes are apparently fighting for time on the same CPU.

You could look at the Time Model Statistics, but they may not be telling much other than there may be a lot of context switches, and more than half of the total SQL execution time was spent

on the CPUs (rather than waiting for disk or other wait events):

Statistic Name                          Time (s)        % of DB Time
sql execute elapsed time                102,885.27       99.48
DB CPU                                   56,931.50       55.05
Java execution elapsed time              18,173.85       17.57
PL/SQL execution elapsed time             1,146.88        1.11
...
DB time                                 103,423.35
background elapsed time                   9,817.43
background cpu time                       1,169.73

The hierarchy of the above Time Model Statistics should be read like this:

1) background elapsed time                9,817.43
      2) background cpu time              1,169.73 (of the 9,817.43)
1) DB time                              103,423.35
    2) DB CPU                            56,931.50 (of the 103,423.35)
    2) connection management call elapsed time
    2) sequence load elapsed time

    2) sql execute elapsed time
    2) parse time elapsed
          3) hard parse elapsed time
                4) hard parse (sharing criteria) elapsed time
                    5) hard parse (bind mismatch) elapsed time
          3) failed parse elapsed time
                4) failed parse (out of shared memory) elapsed time
    2) PL/SQL execution elapsed time      1,146.88
    2) inbound PL/SQL rpc elapsed time
    2) PL/SQL compilation elapsed time
    2) Java execution elapsed time       18,173.85 (of the 103,423.35)

Let's switch our attention to the wait events, but first a couple statistics to provide a frame of reference:

SQL*Net roundtrips to/from client 4,813,090
SQL*Net roundtrips to/from dblink    43,646
user calls                        4,836,293
user commits                         24,259
user rollbacks                           20

Event                             Waits      %Time -outs  Total Wait
Time (s) Avg wait (ms)  Waits /txn
db file sequential read           4,036,055     0.00
13,310               3          166.24
db file parallel write              567,009     0.00
6,515              11           23.35
db file scattered read              934,341     0.00
1,383               1           38.48
log file parallel write              38,621     0.00
1,019              26            1.59
Backup: sbtwrite2                    51,438     0.00
896              17            2.12
Log archive I/O                      15,175     0.00
658              43            0.63
log file sync                        17,706     0.10
387              22            0.73
direct path write temp               98,515     0.00
313               3            4.06
SQL*Net more data to client       4,642,420     0.00
154               0          191.21
enq: TX - row lock contention         1,475     3.19
138              94            0.06
control file parallel write          10,484     0.00
107              10            0.43
log file sequential read             12,839     0.00
87               7            0.53
read by other session                40,059     0.00
79               2            1.65
direct path read                    272,647     0.00
54               0           11.23
log buffer space                        181     0.00
46             256            0.01
enq: WL - contention                     16    87.50
44            2742            0.00
latch: cache buffers chains         363,571     0.00
37               0           14.97
control file single write               448     0.00
28              63            0.02
os thread startup                       424     0.00
23              55            0.02
RMAN backup & recovery I/O            6,130     0.00
18               3            0.25
control file sequential read        955,614     0.00
9               0           39.36
log file switch completion               75     0.00
7              89            0.00
SQL*Net more data from dblink       127,162     0.00
6               0            5.24
cursor: pin S wait on X                 365   100.00
4              10            0.02
SQL*Net message to client         4,819,165     0.00
4               0          198.49

...

According to the number of waits for the SQL*Net events, on average, just about every request for data from the server requires two sends as the data returned is too large to fit into the SDU

(defaults to 2,000 bytes) - this is based on comparing the value of "SQL*Net message to client" to the value of "SQL*Net more data to client". This indicates that network latency, if

significant, could have a significant impact on apparent database performance.

There are timeouts reported for some of the wait events - this may be a critical indicator. If the timeout for a paricular wait event happens to be 3 seconds, and 88% of the waits for an

event happen to result in time outs, that means on average 88% of the time the event happens, it will last 3 seconds in duration. However, the wait count increases by 1 every time a time-out

occurs, thus it might be a single session waiting non-stop for 42 seconds (16 * 87.50 * 3 seconds) = 42 seconds.

"db file sequential read" indicates that Oracle is reading one block from disk - this usually indicates that Oracle is reading from an index, but might also happen rarely during a tablescan

(if the other blocks needed are already in the buffer cache), or when reading the undo blocks to provide a consistent read. "db file scattered read" indicates that Oracle is reading more

than one block at a time from disk - this usually indicates that Oracle is reading from a table during a full table scan.

"db file parallel write" indicates waits related to the DBWR process, and may be related to the number of DBWR processes or DBWR I/O slaves. May also indicate slow or high contention devices. I believe that I previously commented on the value of the parameter db_writer_processes, and suggested that you take a look at an article on Kevin Closson's website related to what happens when that parameter is set to a high value - that article talks about the impact on CPU queuing, which seems to be a problem in this AWR report.

"log file parallel write" indicates Writing redo records to the redo log files from the log buffer. This usually indicates slow devices or contention where the online redo logs are located.

Take a look at the definitions for the other wait events.

Use the CPU utilization and wait events to determine what other sections of the AWR report need to be examined. Post the results of your findings.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Fri Nov 21 2008 - 09:52:36 CST

Original text of this message