Re: AWR Sample Report

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Tue, 23 Dec 2008 20:09:38 -0800 (PST)
Message-ID: <53e0eb29-0ab3-4941-a5be-f8f11bbfa519@w1g2000prk.googlegroups.com>


Comments inline:

On Dec 23, 11:21 am, raja <dextersu..._at_gmail.com> wrote:
> In general the wait list has been split into 4 parts :
> Part 1 -> 2am to 3am ; Part 2 -> 7am to 8am ; Part 3 -> 2pm to 3pm ;
> Part 4 -> 7pm to 8pm
>
> First let us Neglect following the idle events from the wait list :
> jobq slave wait
> PX Deq Credit: need buffer
> PX Deq: Execute Reply
> PX Deq: Execution Msg
> PX Deq: Msg Fragment
> PX Deq: Table Q Normal
> PX Idle Wait
> single-task message

You might need to decide whether or not these "idle" events should be ignored, especially those related to parallel query. For the 7 AM to 8 AM time period the "PX Deq: Execution Msg" wait event is the top wait event, with an average of 9.52 processes in this wait event. You might take a look at this and similar links: http://www.orafaq.com/maillist/oracle-l/2002/05/18/1618.htm

The third highest wait event with an average of 5.72 processes in this wait event during this time interval is "PX Deq Credit: send blkd" - read all replies in this thread to see if there is a reason not to ignore this event:
http://forums.oracle.com/forums/message.jspa?messageID=2521467

You might want to check a couple of the other "idle" wait events here: http://oracledoug.com/px8.html

Another good link:
http://jonathanlewis.wordpress.com/2006/12/28/parallel-execution/

> In general - about the application analysis - the following 2
> observations are :
> a. if we see the no.of wait events, it looks like they are at : Part
> 2 -> 7am to 8am, which should be the peak hour of the transaction.
> b. the jobs, backup and recovery, archiving should be run when it is
> in less peak hour : Part 1 -> 2am to 3am,
> since there are the following events :
> Backup: sbtwrite2 - ( Administrative - Wait class )
> Log archive I/O
> RMAN backup & recovery I/O.

Yes, it looks like the backups were started around 7 PM and were possibly still running at 7 AM, but I may not be fully understanding the "Backup: sbtwrite2" wait event - are the archived redo logs written to tape as they are created?

> With respect to wait events, the observations are as follows :
> c. The following wait events are due to parallel process - it is like
> running parallel queries, using parallel hints.
> PX Deq Credit: send blkd
> PX qref latch
> solution to solve this problem is (from web) : increasing the value of
> parallel_execution_message_size parameter.
> But I feel that reducing the usage of parallel hints will reduce this
> problem. but this would either cause not the efficient usage of the
> CPU or not finishing the query execution faster !!!

This link suggests that the change to the parameter might make matters worse:
http://oracledoug.com/px8.html

> d. read by other session :
> In previous versions this wait was classified under the "buffer busy
> waits" event. However, in Oracle 10.1 and higher this wait time is now
> broken out into the "read by other session" wait event. Excessive
> waits for this event are typically due to several processes repeatedly
> reading the same blocks, e.g. many sessions scanning the same index or
> performing full table scans on the same table.
> solution to solve this problem is (from web) :
> ->buffer busy waits / read by other session: Increase DB_CACHE_SIZE
> (DB_BLOCK_BUFFERS prior to 9i)/ Analyze contention from SYS.V$BH
> But going through the link :http://www.confio.com/English/Tips/Read_By_Other_Session.php,
> i feel that tuning the queries would be the option to reduce this
> wait. Have to check for hot blocks/objects too ?

"read by other session" is related to buffer busy waits, but it is a very specific type of buffer busy wait in older versions of Oracle, it appears that you need to be a little careful trying to rely too much on some websites. From the Oracle documentation: http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/waitevents003.htm#insertedID107 "This event occurs when a session requests a buffer that is currently being read into the buffer cache by another session. Prior to release 10.1, waits for this event were grouped with the other reasons for waiting for buffers under the 'buffer busy wait' event"

My notes on this wait event are the following: "This event indicates that multiple processes are attempting to read a block from disk concurrently, while one session is reading the block from disk, the other sessions wait on this event. Determine if there are too many active files on the same file system exhausting IO bandwidth. If parallel execution (PX) is being used, determine if the IO subsystem is saturated by having too many slaves in use. This event likely indicates disk contention."

Parallel query is a bit unique, as it requires dirty (previously changed) blocks in the buffer cache to be written to the data files before the parallel operation may start. The parallel slaves then bypass the buffer cache when executing, so increasing the DB_CACHE_SIZE may make the problem worse as more dirty buffers may need to be flushed to disk before the parallel operation begins. There are certainly better descriptions of how parallel query is a bit unique, but I was unable to quickly locate a reference to provide.

> e.
> In Oracle, we see two types of data block access - db file sequential
> read and db file scattered read :
> ->db file sequential read A single-block read (i.e., index fetch by
> ROWID).
> A large number of waits here could indicate poor joining orders of
> tables, or unselective indexing. It is normal for this number to be
> large for a high-transaction, well-tuned system, but it can indicate
> problems in some circumstances.
> They can also show up as direct path read/write waits.

Mostly true, but could also be reported during a full tablescan if adjacent table blocks are already in the buffer cache, or for reading the undo tablespace to provide a consistent read. I am fairly certain that this wait event is not related to direct path read/write waits.

> ->db file scattered read A multiblock read (a full-table scan, OPQ,
> sorting)
> A large number here indicates that your table may have missing or
> suppressed indexes. Although it may be more efficient in your
> situation to perform a full table scan than an index scan, check to
> ensure that full table scans are necessary when you see these waits.
> Try to cache small tables to avoid reading them in over and over
> again
> solution to solve this problem is (from web) : increase the RAM size
> of the db_cache_size to reduce disk I/O, tune the SQL to reduce disk I/
> O, or invest in a faster disk I/O subsystem.

Note that these waits may also appear during index fast full scans. I am not so certain that db file scattered read would be reported for Oracle Parallel Query ("direct path read" is more likely) or for sorting ("direct path read temp" is more likely). Due to the caching algorithm used by Oracle when performing a full table scan (blocks read are placed on the LRU end), unless the table has been assigned to the KEEP buffer pool, increasing the size of the buffer cache (or the keep buffer cache) probably will not help much, unless the table is small (number of blocks below the high water mark) in comparison to the size of the buffer cache.

> ->log file sequential read - indicates that the process is waiting for
> blocks to be read from the online redo log into memory. This primarily
> occurs at instance startup and when the ARCH process archives filled
> online redo logs.
>
> ->db file parallel write - occurs when the process, typically DBWR,
> has issued multiple I/O requests in parallel to write dirty blocks
> from the buffer cache to disk, and is waiting for all requests to
> complete.
> ( dont know much - have to check regarding DBWR )
> Looks like this is mostly due to i/o issue.

> The direct path read, direct path write wait events are also
> associated with the data block access (db file sequential read and db
> file scattered read ) - dont know about these wait events, have to
> check regarding these.

Probably associated with parallel query.

> Info : Starting in Oracle 10g release2, Oracle recommends not setting
> the db_file_multiblock_read_count parameter.
> solution to solve these problem are (from web) :
> ->db file sequential read: Tune SQL to do less I/O. Make sure all
> objects are analyzed. Redistribute I/O across disks.

If on Oracle 10.2.0.1 or above you allow Oracle to auto-tune the db_file_multiblock_read_count parameter, the parameter will likely be set to achieve a 1MB multiblock read size (a value of 128 will be set for a database with an 8KB block size). This could have no effect or it could have a significant effect on performance.

> So, in general to reduce the following waits, we have to tune the sql
> queries (checking the indexes and joins) may give improvement in
> performance :
> db file sequential read,
> db file scattered read,
> log file sequential read,
> db file parallel write,
> direct path read, and
> direct path write

I think that you may remove "log file sequential read" from the above list, and possibly "db file parallel write".

> f. The following 3 wait events comes together, since they are related
> to LGWR/redo log ( dont know much - have to check regarding LGWR/redo
> log )
> ->log file sequential read - LGWR background process waits for this
> event while it is copying redo records from the memory Log Buffer
> cache to the current redo group's member logfiles on disk.

I am not sure that the above is correct. See: http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm "Waiting for the read from this logfile to return. This is used to read redo records from the log file."

> ->log file sync - User session waits on this wait event while waiting
> for LGWR to post after commit write of dirty blocks

The wait for the log file sync is very small compared to the other events, maybe it should be ignored for now.

> ->log file parallel write - Waiting for the writes of redo records to
> the redo log files

The wait for log file parallel write is fairly small, maybe it should be ignored for now.

In the 7 AM to 8 AM time interval, do not forget the more than 7,000,000 waits for "SQL*Net break/reset to client" - this could be very important from a performance standpoint.

> Conclusion from the above wait event list given by you :
> More waits are seen (db file sequential read, db file scattered
> read ), which are due to i/o's.
> To reduce these wait events, have to tune the sql ( check index and
> joins).
>
> Thanks. I have learnt something about the wait events.
>
> Please check whether the above observations are correct ?
>
> Also, Please provide your suggestions too...
>
> With Regards,
> Raja.

You are trying very hard to understand the contents of an AWR/ Statspack report - you are using the right steps to understand the problems. I am not an expert on parallel queries, so I may have made a couple small mistakes in the above information.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Tue Dec 23 2008 - 22:09:38 CST

Original text of this message