Skip navigation.

A Wider View: Experimentations and ruminations on Oracle performance analysis

Syndicate content
Oracle Database performance tuning and analysis experimentation and ruminations.Craig Shallahamer, President/Founder, OraPubhttp://www.blogger.com/profile/04109635337570098781noreply@blogger.comBlogger78125
Updated: 10 hours 42 min ago

What Is Oracle Elapsed Time And Wall Time With A Parallelism Twist

Mon, 2014-08-18 09:14

What Is Oracle Elapsed Time And Wall Time With A Parallelism Twist
In this post I'm focusing on Oracle Database SQL elapsed time, adding parallelism into the mix and then revisiting wall time. What initially seems simple can take some very interesting twists!

If you are into tuning Oracle Database systems, you care about time. And if you care about time, then you need to understand the most important time parameters: what they are, their differences, how they relate to each other and how to use them in your performance tuning work.

A couple weeks ago I wrote about Oracle DB Time, non-idle wait time, and server process CPU consumption (DB CPU) time. If you have not read that posting, HERE is the link. It must be a good read because it quickly become my most viewed post ever! In this posting, the focus is SQL elapsed time, parallelism, and again wall time. Enjoy!

Quick Review
In my previous related post, I covered non-idle wait time, DB CPU, and DB Time. Here is a very quick summary of each.

Non-Idle Wait Time occurs when an Oracle process is not consuming CPU, the session pauses (i.e., waits) and Oracle considers the wait time important for performance tuning. An example of a non-idle wait event is direct path read temp. An example of an idle wait event is SQL*Net message from client or pmon timer.

DB CPU is Oracle server/foreground/shadow process CPU consumption. This is not include Oracle background process CPU consumption.

DB Time is DB CPU plus Non-Idle Wait Time. Remember that DB Time does not include background process CPU consumption and Oracle Corporation determines which wait events are considered idle.

Elapsed Time
Elapsed Time (ET) is all DB Time related to a defined task. A "defined task" could be a SQL statement, group of SQL statements, pl/sql procedure, batch job, etc. It is whatever makes sense in your tuning situation.

The elapsed time for a SQL_ID can be found in v$sql. But be careful because this elapsed time is related to "all" the SQL_ID executions. Thankfully, there is an "executions" column in v$sql.
Elapsed time is displayed in a number of areas within an Oracle Database AWR and Statspack report. Looking at the above screen shot, the "top" elapsed time SQL has an elapsed time of 268561 seconds. This means that over the AWR report's snapshot interval, for all this SQL's executions, its total DB Time is 268561 seconds. Said another way, if we were to add up all this SQL's DB CPU and non-idle wait time for all its executions within the snapshot interval, the value should be 268561.

There is a lot of great information provided in the AWR and Statspack SQL reports. For example, because the elapsed time and the CPU time (DB CPU) is shown above, we can calculate the non-idle wait time for the "top" elapsed time SQL ID.

non idle wait time = elapsed time - cpu time
268465 = 268561 - 96

For the "top" elapsed SQL, its elapsed time 268561 and it's DB CPU is 96 therefore its non-idle wait time is 268465. Wow! This statement has tons of associated wait time compared to CPU consumption time.

But it gets even better! Because the total Elapsed Time and the total number Executions over the snapshot interval is displayed, we can determine the average elapsed time!

average elapsed time = total elapsed time / executions
746.03 = 268561 / 36

Do not be deceived! The average elapsed time is unlikely what the user is experiencing. Two possibility examples for this deception are skewed elapsed times and parallelism.

For most DBAs this is unexpected. It also causes performance perception problems yet solutions are available to understand what's really going on.

I've spent so much time researching this topic and seen it increase my consulting value, I've posted a number of blog entries on this subject. Plus I created an OraPub Online Institute seminar focused specifically on this subject. It's called Using Skewed Performance Data To Your Advantage. Check it out. I'm really proud how it turned out. I also have a couple of OSM scripts dedicated to this topic, sqlelget[11].sql.

Revisiting Wall Time With A Parallelism Twist
Now it's time to put this all together.

DB CPU is the Oracle server process CPU consumption.

Non-Idle Wait Time (NIWT) is the time when an Oracle process can not consume CPU and must pause and we care about this time.

DB Time is the Oracle server process CPU consumption and all non-idle wait time.

Elapsed Time (ET) is the sum (i.e., all) DB Time related to a task, such as a SQL_ID.

Wall Time is what we hope the user experiences. I'll assume there is no time gap between Oracle and the user, therefore the wall time will equal the user's experience.

Effective Parallelism is the effective number of Oracle parallel slaves or some other form of parallelism, such as designed-in application parallelism. (For simplicity, I'm only going to mention Oracle parallel query.) If Oracle parallel query is not involved, then the effective parallelism is one. If two parallel query slaves are involved, then the effective parallelism will be a little less than 2

Parallelism can reduce wall time because we can simultaneously "burn time" in multiple places. For example, 60 seconds of elapsed time with a process running serially, results in a wall time of 60 seconds. But if we have two parallel query slaves, while the elapsed time (i.e., all the DB Time) is still 60 seconds (plus some overhead time), the wall time will be around 30 seconds (plus some overhead time).

The math is really simple...that is until you factor in scalability (i.e., the overhead), which I won't. If you're interested, read the last chapter of my book, Forecasting Oracle Performance.

Let's simplify this by using some mathematical notation.

DB Time = DB CPU + NIWT

Elapsed Time = Sum of DB Time

Wall Time = Elapsed Time / Effective Parallelism

Pretty straightforward, eh? Below is a short video clip summarizing this from the OraPub Online Institute seminar, Tuning Oracle Using an AWR Report (based on an Oracle Time Based Analysis). (To be released September 2014.) If you can't see the video, click HERE watch it on YouTube.



Test You Knowledge
True or False? If the total elapsed time is 60 seconds and parallel query is not involved, the total wall time will also be 60 seconds. True

True or False? If the elapsed time per execution is 60 seconds and the wall time is 30 seconds, then parallel query is involved. True

True or False? Bonus question yet very important to understand: If the elapsed time per execution is 60 seconds and two PQ slaves are involved, then the wall time will be 30 seconds.

The last question is false because there is overhead when parallelizing. Parallelism is not free. Because of this, the wall time will hopefully drop to perhaps 35 seconds. That 5 seconds is the parallelization overhead.

Coming Up Next: Video Proof!
While the above may seem correct, I ran some SQL statements and captured the relevant time statistics. There is quite a bit of detail and I ran two different tests, so I'll post that in a week or two.

Thanks for reading,

Craig.
https://resources.orapub.com/OraPub_Online_Training_About_Oracle_Database_Tuning_s/100.htmYou can watch the seminar introductions for free on YouTube!If you enjoy my blog, subscribing will ensure you get a short-concise email about a new posting. Look for the form on this page.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com.




Categories: DBA Blogs

Watch Oracle DB Session Activity With The Real-Time Session Sampler

Mon, 2014-08-11 16:38
Watch Oracle DB Session Activity With My Real-Time Session Sampler
Watching session activity is a great way to diagnose and learn about Oracle Database tuning. There are many approaches to this. I wanted something simple, useful, modifiable, no Oracle licensing
issues and that I could give away. The result is what I call the Oracle Real-Time Session Sampler (OSM: rss.sql).

The tool is simple to use.  Based on a number filtering command line inputs, it repeatedly samples active Oracle sessions and writes the output to a file in /tmp. You can do a "tail -f" on the file to watch session activity in real time!

The rss.sql tool is included in the OraPub System Monitor (OSM) toolkit (v13j), which can be downloaded HERE.

If you simply want to watch a video demo, watch below or click HERE.


The Back-Story
Over the past two months I have been creating my next OraPub Online Institute seminar about how to tune Oracle with an AWR/Statspack report using a quantitative time based approach. Yeah... I know the title is long. Technically I could have used Oracle's Active Session History view (v$active_session_history) but I didn't want anyone to worry about ASH licensing issues. And ASH is not available with Oracle Standard Edition.

The Real-Time Session Sampler is used in a few places in the online seminar where I teach about Oracle session CPU consumption and wait time. I needed something visual that would obviously convey the point I wanted to make. The Real-Time Session Sampler worked perfectly for this.

What It Does
Based on a number of command line inputs, rss.sql repeatedly samples active Oracle sessions and writes the output to file in /tmp. The script contains no dml statements. You can do a "tail -f" on the output file to see session activity in real time. You can look at all sessions, a single session, sessions that are consuming CPU or waiting or both, etc. You can even change the sample rate. For example, once every 5.0 seconds or once every 0.25 seconds! It's very flexible and it's fascinating to watch.

Here is an example of some real output.



How To Use RSS.SQL
The tool is run within SQL*Plus and the output is written to the file /tmp/rss_sql.txt. You need two windows: one to sample the sessions and other other to look at the output file. Here are the script parameter options:

rss.sql  low_sid  high_sid  low_serial  high_serial  session_state  wait_event_partial|%  sample_delay

low_sid is the low Oracle session id.
high_sid is the high Oracle session id.
low_serial is the low Oracle session's serial number.
high_serial is the high Oracle session's serial number.
session_state is the current state of the session at the moment of sampling: "cpu", "wait" or for both "%".
wait_event_partial is when the session is waiting, select the session only with this wait event. Always set this to "%" unless you want to tighten the filtering.
sample_delay is the delay between samples, in seconds.

Examples You May Want To Try
By looking at the below examples, you'll quickly grasp that this tool can be used in a variety of situations.

Situation: I want to sample a single session (sid:10 serial:50) once every five seconds.

SQL>@rss.sql  10 10 50 50 % % 5.0

Situation: I want to essentially stream a single session's (sid:10 serial:50) activity.

SQL>@rss.sql 10 10 50 50 % % 0.125

Situation: I want to see what sessions are waiting for an row level lock while sampling once every second.

SQL>@rss.sql 0 99999 0 99999 wait enq%tx%row% 1.0

Situation: I want to see which sessions are consuming CPU, while sampling once every half second.

SQL>@rss.sql 0 99999 0 99999 cpu % 0.50

Be Responsible... It's Not OraPub's Fault!
Have fun and explore...but watch out! Any time you are sample repeatedly, you run the risk of impacting the system under observation. You can reduce this risk by sampling less often (perhaps once every 5 seconds), by limiting the sessions you want to sample (not 0 to 99999) and by only select sessions in either a "cpu" or "wait" state.

A smart lower impact strategy would be to initially keep a broader selection criteria but sample less often; perhaps once every 15 seconds. Once you know what you want to look for, tighten the selection criteria and sample more frequently. If you have identified a specific session of interest, then you stream the activity (if appropriate) every half second or perhaps every quarter second.

All the best in your Oracle Database tuning work,

Craig.
https://resources.orapub.com/OraPub_Online_Training_About_Oracle_Database_Tuning_s/100.htmYou can watch the seminar introductions for free on YouTube!If you enjoy my blog, subscribing will ensure you get a short-concise email about a new posting. Look for the form on this page.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com.






Categories: DBA Blogs

What Is Oracle DB Time, DB CPU, Wall Time and Non-Idle Wait Time

Tue, 2014-08-05 05:30
What Is Oracle DB Time, DB CPU, Wall Time and Non-Idle Wait Time
If you are into tuning Oracle Database systems, you care about time. And if you care about time, then you need to understand the most important time parameters: what they are, their differences, how they relate to each other and how to use them in your performance tuning work.

The key Oracle Database time parameters are elapsed time, database time (DB Time), non-idle wait time and server process CPU consumption (DB CPU) time.

This first post is pretty basic, yet core fundamental stuff. So in the following two posts I'll introduce elapsed time, add parallelism into the mix and revisit wall time. What initially seems simple can some take very interesting twists!

You probably know that I am all about quantitative Oracle performance analysis. I research, write, teach, and speak about it. I even have an OraPub Online Institute seminar about how to tune your Oracle Database systems from a standard AWR or Statspack report using an Oracle Time Based Analysis (OTBA) framework.

So let's get started!

Wall Time & Run Time
I'll start with Wall Time because that is close (hopefully) to what a user experiences. In fact, if there is no time gap between the Oracle Database and the user, then we can do a little math and figure out what the users are, on average, experiencing. I'll get back to wall time in the next post, where I include elapsed time and parallelism into the equation.

DB CPU
DB CPU is Oracle server/foreground/shadow process CPU consumption. Each Oracle server process gathers its own CPU consumption using the time and/or getrusage C function system call. So unless there is a major screw-up by either the operating system or the Oracle kernel developers, the time will be good... very good. The name DB CPU is taken from the actual statistic name, which is found in both v$sess_time_model and v$sys_time_model.

If you look at any AWR or Statspack report in the "Time Model" section, you will see DB CPU. The value shown will be all server process CPU consumption within the reporting snapshot interval, converted to seconds. (The raw statistic is stored in microseconds.)

Below is an example Time Model Statistics screen shot from a standard AWR report. I've highlighted DB CPU.



If you run one of my OraPub System Monitor (OSM) time related tools like ttpctx.sql or rtpctx.sql you see a CPU time statistic. That contains both the DB CPU (i.e., server process) and "background process cpu" statistics. Here's an example.

SQL> @ttpctx.sql
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER about 11 times.

Database: prod35 31-JUL-14 12:09pm
Report: ttpctx.sql OSM by OraPub, Inc. Page 1
Total Time Activity (142 sec interval)

Avg Time Time Wait
Time Component % TT % WT Waited (ms) (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs 95.95 0.00 0.000 347.212 0
PX Deq: Slave Session Stats 1.45 35.74 0.113 5.240 47
library cache: mutex X 0.58 14.26 0.136 2.090 15
PX Deq: Slave Join Frag 0.43 10.57 0.067 1.550 23
PX Deq: Signal ACK EXT 0.29 7.16 0.045 1.050 23
control file parallel write 0.28 7.03 20.600 1.030 0
PX qref latch 0.27 6.75 0.012 0.990 85
latch free 0.20 4.91 0.090 0.720 8
log file parallel write 0.16 4.02 12.826 0.590 0

Non-Idle Wait Time
When an Oracle process can not consume CPU, it will pause. As an Oracle DBA, we know this as wait time. Sometimes a process waits and it's not a performance problem, so we call this Idle Wait Time. Oracle background processes typically have lots of idle wait time. However, when a user is waiting for sometime to complete and way down deep their Oracle server process is waiting to get perhaps a lock or latch, this is Non-Idle Wait Time. Obviously, when tuning Oracle we care a lot about non-idle wait time.

Below is a simple query showing wait event classifications. In this system there are 119 Idle wait events, so all the rest would be classified as non-idle wait events.

Oracle uses a variety of methods to determine wait time. I have a number of postings and educational content available about this. You'll see them if you do an OraPub or blog search for "time".

When working with non-idle wait time, remember the 80/20 rule. Most of the wait time we care about will be contained with in the largest ("top") two to four wait events. Don't waste YOUR time focusing on the 20%.

Here's an example. In the screen shot below, while not shown the total wait time is 1966 seconds.
If you add up the displayed "top" four wait events, their combined wait time is 1857. This is about 95% of all the non-idle wait time. This is a good example demonstrating that most of the wait time is found in the top two to four events.

My OSM toolkit has many wait time related tools. Most start with "sw" for "session wait" but the both ttpctx.sql or rtpctx.sql will contain the non-idle wait time and also CPU consumption. This is a good time to transition into DB Time.

DB Time
DB Time is a time model statistic that is the sum of Oracle process CPU consumption and non-idle wait time. When optimizing Oracle systems we typically focus on reducing "time", though many times database work is also part of the equation. This "time" is essentially DB Time, though sometimes I take control over what I consider idle wait time.

The name DB Time comes from the actual statistic name in both v$sess_time_model and v$sys_time_model.

If you look at any AWR or Statspack report in the "Time Model" section, you will see DB Time.
The DB time value is technically all server process CPU consumption plus the non-idle wait time within the reporting snapshot interval, converted to seconds. (The raw statistic is stored in microseconds.) Surprisingly, Oracle does not include "background cpu time" in the DB Time statistic. There are both good and not so good reasons the background CPU time is not include, but that's a topic for another posting.

A Little Math
We have enough detail to relate DB Time, DB CPU and non-idle wait time together... using a little math.

DB Time = DB CPU + non_idle_wait_time

And of course,

non_idle_wait_time = DB Time - DB CPU

This is important, because there is no single statistic that shows all the non-idle wait time. This must be derived. Shown above is one way to derive the non-idle wait time. Take a look at the AWR report snippet below.

In the Non-Idle Wait Time section above, I stated that the total non-idle wait time was 1966 seconds. I derived this from the Time Model screen shown above. I simply did:

non_idle_wait_time = DB Time - DB CPU
1966.16 = 4032.03 - 2065.87

Coming Up Next
I wanted to keep this post short, which means I left out the more interesting topics. So in the next post I'll merge into the picture elapsed time along with parallelism and revisit wall time. Then in the third post (that's my guess at this point), I'll actually demonstrate this in two different systems.

Thanks for reading,

Craig.

https://resources.orapub.com/OraPub_Online_Training_About_Oracle_Database_Tuning_s/100.htmYou can watch the seminar introductions for free on YouTube!If you enjoy my blog, subscribing will ensure you get a short-concise email about a new posting. Look for the form on this page.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com.






Categories: DBA Blogs

Four Options For Oracle DBA Tuning Training

Mon, 2014-07-28 09:00
Four Options For Oracle DBA Tuning Training
Oracle DBAs are constantly solving problems... mysteries. That requires a constant knowledge increase. I received more personal emails from my Oracle DBA Training Options Are Changing posting than ever before. Many of these were from frustrated, angry, and "stuck" DBAs. But in some way, almost all asked the question, "What should I do?"

In response to the "What should I do?" question, I came up with four types of Oracle DBA performance tuning training that are available today. Here they are:

Instructor Led Training (ILT) 
Instructor Led Training (ILT) is the best because you have a personal connection with the teacher. I can't speak for other companies, but I strive to connect with every student and every student knows they can personally email or call me...even years after the training. In fact, I practically beg them to do what we do in class on their production systems and send me the results so I can continue helping them. To me being a great teacher is more than being a great communicator. It's about connection. ILT makes connecting with students easy.

Content Aggregators
Content Aggregators are the folks who pull together free content from various sources, organize and display it. Oh yeah... and they profit from it. Sometimes the content value is high, sometimes not. I tend to think of content aggregators like patent trolls, yet many times they can be a great resource. The problem is you're not dealing with the creator of the content. However, the creator of the content actually knows the subject matter. You can somtimes contact them...as I encourage my students and readers to do.

Content Creators
Content Creators are the folks who create content based on their experiences. We receive that content through their blogs, videos, conference presentations and sometimes through their training. I am a content creator but with an original, almost child-like curiosity, performance research twist. Content creators rarely directly profit from their posted content, but somehow try to transform it into a revenue stream. I can personally attest, it can be a risky financial strategy...but it's personally very rewarding. Since I love do research, it's easy and enjoyable to post my findings so others may benefit.

Online Training (OLT)
Online Training (OLT) is something I have put off for years. The online Oracle training I have seen is mostly complete and total crap. The content is usually technically low and mechanical. The production quality is something a six year old can do on their PC. The teaching quality is ridiculous and the experience puts you to sleep. I do not ever want to be associated with that kind of crowd.

I was determined to do something different. It had to be the highest quality. I have invested thousands of dollars in time, labor, and equipment to make online video training
Craig teaching in an OraPub Online Institute Seminarwork. Based on the encouraging feedback I receive it's working!

This totally caught me by surprise. I have discovered that I can do things through special effects and a highly organized delivery that is impossible to do in a classroom. (Just watch my seminar introductions on YouTube and you'll quickly see what I mean.) This makes the content rich and highly compressed. One hour of OraPub Online Institute training is easily equivalent to two to four hours of classroom training. Easily. I have also strive to keep the price super low, the production at a professional level and ensure the video can be streamed anywhere in the world and on any device. Online training is an option, but you have to search for it.

Summary
So there you have it. Because of economics and the devaluation of DBAs as human beings coupled with new technologies, the Oracle DBA still has at least four main sources of training and knowledge expansion. Don't give up learning!

Some of you reading may be surprised that I'm writing about this topic because it will hurt my traditional instructor led training (public or on-site) classes. I don't think so. If people can attend my classes in person, they will. Otherwise, I hope they will register for an OraPub Online Institute seminar. Or, at least subscribe to my blog (see upper left of page).

All the best in your quest to do great work,

Craig.
You can watch seminar introductions for free on YouTube!
If you enjoy my blog, subscribing will ensure you get a short-concise email about a new posting. Look for the form on this page.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com.

Categories: DBA Blogs

Oracle DBA Training Options Are Changing

Tue, 2014-07-15 07:43
Oracle DBA Training Options Are Changing
Training options for Oracle Database DBAs are changing. Generally, I don't think they are for the better. Companies don't value Oracle Database Administrators like they used to. And, it shows in the lack of their professional development investment.

When I travel a long way from home, I tend to get very reflective about life, death and beyond. On my way home from teaching an onsite two-day Oracle performance tuning seminar coupled with a one-day predictive analysis (forecasting) class in Ghana (yes, Ghana as in AFRICA) I started thinking about how fortunate the Ghana DBAs I taught are. Clearly their management is willing to invest in their DBAs' future. This is very, very rare.

Today most Oracle DBAs receive what I call, "Training By Google." You know, blog posts, YouTube videos and various syntax websites. While these are all valuable (I am a content creator myself with my blog posts and videos), they are no substitute for instructor led training. Not even close! So what is happening that is forcing Oracle DBAs to change their training habits?

So Why The Change? Three Reasons
1. Training Budget. Over the past five years I have been disappointed (more like disturbed) that most companies do not provide the training DBAs need. They just won't do it. IT managers (not typically DBA managers) believe their staff can get by with "Training By Google." It's stupid and foolish. It tells DBAs they are worthless and leaves them unprepared to perform at their best. And, of course, that ends up hurting the company they work for. Stupid and foolish.

Are we then surprised with the results from poor performing systems, down production systems, massive security breaches, and DBAs hopping from one company to another?

2. Travel Budget. A nasty tactic many companies use is to provide a minimal training budget but without a travel budget. If you want specialized and advanced training, you'll probably have to travel to get to it. Maybe not hundreds or thousands of miles, but probably more than you want to commute each day.

Essentially the company is splitting the training cost with the DBA and ensuring the DBA really, really wants the training. OK, I can respect that. But, I think a company that does not truly provide training for its employees (human beings that spend a significant portion of their lives doing whatever it takes to get the job done) is cruel and frankly immoral.

3. More Training Options. The good news for Oracle DBAs is there is more information and training options available today than ever before. When the orapub.com website began in 1995, doing a "tail -f" on the web log was a lesson in world geography. It was amazing watching line after line stream by as DBAs from all over the world were looking for Oracle performance materials through the web. Now there is much more available. Training options for Oracle DBAs now include traditional instructor led training (ILT), web sites from content aggregators (people who pull together content for us), content creators (like myself), and online training. I'm very excited about online training and have made a significant investment in OraPub's Online Institute.

Summary
So there you have it. Because of economics, the devaluation of DBAs as human beings and the increase in training options, the Oracle DBA training landscape is changing. If you believe this, the next question is, "What is good content?" That will be the subject of my next posting!

Enjoy your work and thanks for reading!

Craig.

https://resources.orapub.com/OraPub_Online_Seminars_About_Oracle_Database_Performance_s/100.htm
You can watch seminar introductions (like above) for free on YouTube!
If you enjoy my blog, I suspect you'll get a lot out of my online or in-class training. For details go to www.orapub.com. I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.

Categories: DBA Blogs

How The Oracle Database Determines Wait Time When It's Not Set

Mon, 2014-06-30 06:41
Have you ever wondered how the Oracle Database 12c (and earlier versions) determines the wait time when it has absolutely no control over how long the wait will take? If so, then read on!

The Back Story
Using wait time is part of an Oracle Time Based Analysis (OTBA). While Oracle process CPU consumption is a big part of the analysis, the other category is non-idle wait time.

You can see the two categories of time clearly "in action" with my Real-Time Session Sampler script, rss.sql. It's part of my OraPub System Monitor (OSM) toolkit, that can be downloaded for free. Here's an example of the output:


I need to  explain a little more about CPU time and then Oracle wait time.

An Oracle processes wants to burn CPU. Without consuming CPU an Oracle process, can well...not process work! Oracle keeps track of the CPU consumption time. It's the actual CPU time consumed, for example, 500 ms or 3 seconds.

When an Oracle process can not burn CPU, the process, in Oracle terms, it must wait. For example, when an Oracle process waits, it's like it yells out details about why it's waiting. We call this yell a wait event. Each wait event has a name. And the name provides clues about why the process can't burn CPU and is therefore waiting.

There Are Different Reasons Why An Oracle Process Waits
There are three broad categories Oracle must time, when a process is not consuming CPU:
  1. When the wait time is predetermined and not interruptible. Perhaps when a latch can not be acquired through repeated attempts so the process takes a break (i.e., sleeps) for a fix period of time, say 10ms.
  2. When the wait time is predetermined but the process can be woken by another process. Perhaps a log writer is in the middle of its three second sleep and then a server process commits. The log writer will be signaled to wake and do some work.
  3. When Oracle has no idea how long the wait may last. Perhaps a process submitted a block to the IO subsystem for a synchronous read. Oracle has no idea how long this may take. This situation is what I'm focusing on in this posting and I show in the video below.
Exploring When Oracle Has No Idea How Long The Wait Will Take
Let's say when I work it's like an Oracle process consuming CPU. And if I have to stop working to drive to a meeting, it's like an Oracle process waiting. And in this situation,  I really don't know how long the wait will take. It's out of my control. There could be an accident along the way (think: table level lock)! An Oracle process can experience this same kind of thing.

If I'm an Oracle server process and I discover a block I need is not in the buffer cache, I'm going to need to make a call to the OS for that block. When I make the synchronous IO call, I really do not know how long it will take and I have to wait, that is, the Oracle process must wait.

From a DBA perspective, when I perform an Oracle Time Based Analysis (OTBA) I need know how long the IO, that is, the wait took. How does Oracle figure this out? It's pretty simple actually. It's basiclly like this:
  • Get the current time, start time
  • Make the synchronous IO call and wait until IO received
  • Get the current time, end time
  • Calculate the delta, end time - start time
The "delta" is the wait time for the single block read. If the single block read is a synchronous read and then placed into the buffer cache, Oracle will tag or name the wait time calling it a "db file sequential read".

Actually Watching An Oracle Process Figuring Out the Wait Time
Some say that seeing is believing. For sure it helps one to learn quickly. If you want to see with your own eyes an Oracle server process determine wait time for a multiple block synchronous read (event name is, db file scattered read), watch the below video.



Pretty cool, eh? By the way, the processing of timing processes and events has a special name, called instrumentation. As we can see, Oracle has instrumented its kernel code.

Enjoy your work and thanks for reading!

Craig.
https://resources.orapub.com/OraPub_Online_Seminars_About_Oracle_Database_Performance_s/100.htmYou can watch seminar introductions (like above) for free on YouTube!
If you enjoy my blog, I suspect you'll get a lot out of my online or in-class training. For details go to www.orapub.com. I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.










Categories: DBA Blogs

The LGWR Three Second Rule. Really?

Sun, 2014-06-08 08:39
Does the Oracle Database 12c Log Writer Really Sleep For Three Seconds?I have learned the part of Oracle Database performance tuning is checking if what you have been taught is actually true. What I'm writing about today has been done before in Oracle Database 10g and 11g, but I wanted to document this using 12c.

When I was first learning about the Oracle Database process architecture, the instructor said there are a number of reasons the log writer (LGWR) background process will wake up and do "something." In addition to this, if the LGWR was not signaled to wake up, every three seconds it would wake up by itself and check if there was something it needed to do.

Is this "three second rule" really true? That's what this posting is all about.

If you don't want to read the details below, I created a short video for you to watch.


Here is what you just saw in static text. First let's get the LGWR process PID. Here's one way to do this.
$ ps -eaf | grep lgwr
oracle 41977 1 0 May16 ? 00:00:50 ora_lgwr_prod35
oracle 46294 46110 0 07:39 pts/0 00:00:00 grep lgwr
oracle 60264 1 0 Jan13 ? 00:14:14 ora_lgwr_prod30
My Oracle Database 12c instance is prod35, so the process I'm interested in 41977.

Note: Since this is a 12c instance, there are also two additional log writers, lg00 and lg01. But that's a topic for another posting!

At this point, for simplicity sake it's important there be little instance activity. It will make it much simpler to find what we are looking for... the three second sleep
To watch the LGWR process, I'm going to use the Linux strace command. To easily see the system call timing, I'm going to use the -r option. I'm also going to redirect the output, using the -o option to the out.txt file. I'll let the below strace run for about 15 seconds. OK... here we go:
$ strace -rp 41977 -o out.txt
Process 41977 attached - interrupt to quit
^CProcess 41977 detached
To break out of the strace, I simply did a control-c and the result is what you see above. Let's look at the first three lines of the out.txt file so I can tell you what we will be looking for.
$ head -3 out.txt
0.000000 clock_gettime(CLOCK_MONOTONIC, {19940373, 720653971}) = 0
0.000186 gettimeofday({1401979552, 853214}, NULL) = 0
0.000207 gettimeofday({1401979552, 853444}, NULL) = 0
With the -r option, the first column (far left) shows how long the system call took the for call directly above. For example, in the snipped above, the first clock_gettime call took 0.000186 seconds. The first gettimeofday call took 0.000207 seconds. Got it? Read on!
You may wonder why the system call time is the next line down? It actually makes since, because strace displays information as it becomes available. When the call starts obviously it does not know the call duration, so it just displays the call itself. When the call completes and therefore has the time, it shows return details (e.g., "= 0"), line feeds, prints the call time, the next call is displayed, and repeat. OK, back to our objective.

What we are looking for is a three second sleep. In other words, we are looking for a call that takes three seconds. And since a call's duration shown in the first column of the next line down, we want to look for values in the first column that start with 3.0.

Below is an example of what I found in the out.txt file.
...
0.000033 clock_gettime(CLOCK_MONOTONIC, {19940376, 746821505}) = 0
0.000033 semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) = -1 EAGAIN
                   (Resource temporarily unavailable)
3.000436 clock_gettime(CLOCK_MONOTONIC, {19940379, 747351774}) = 0
0.000188 gettimeofday({1401979558, 879922}, NULL) = 0
0.000171 clock_gettime(CLOCK_MONOTONIC, {19940379, 747708976}) = 0
...
Looking at the above output snippet, notice the time of 3.000436. Then look at the call directly above it. It's the semtimedop call. This call can be used to put a process to sleep, but the process can be woken up. In other words, the LGWR can set an alarm for three seconds, go to sleep, and if it is not disturbed, it will wake up in three seconds. Shown above is an example of the LGWR setting the alarm for three seconds, not being woken up until the alarm fired. This is an example of the three second rule!
If you do what I did and look at your output file, you'll see lots of these "three seconds." If you do this on a pretty-much idle Oracle Database 12c instance's LG01 process (one of the two default LGWR children), it may contain mostly semtimedop calls, each with a three second sleep.

As you might have guessed, the sleep time is part of the system call. Look closely at the semtimedop call below:
semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0})
Notice the "{3, 0}" near the far right? The defines the maximum delay, that is, the sleep. The 3 is the seconds and the 0 is the number of nanoseconds. If you're interested in other sleep options and the actual code to make this work, check out this posting.

Oracle could have used the select system call (not "select * from...") but that would have been a bad idea because then the process could not be woken before the three seconds limit. This topic touches very closely on Oracle Time Based Analysis and more specifically about Oracle's wait interface. I really get into this, including sample C code in my online video seminar entitled, Introduction To Oracle Time Based Analysis.

So there you go! The Oracle kernel code developers set the default sleep time to three seconds along with the ability for the process to be awoken, should the need arise.

Give It A Try
Check this out for yourself on a test or QA system. Once you see it for yourself, you'll never forget about it!

But it gets even better! If you really want to dig into this, do the same but on your database writer (DBWR). Does the DBWR have a three second rule? Does Oracle use the same system call? I'm not going to tell you the answer. You'll have to figure that out for yourself!

Always A Three Second Delay?
You may have noticed that sometimes the process does not sleep the entire three seconds and at other times Oracle sets the sleep time less than three seconds. What up with that? That will be my next posting!

Thanks for reading,

Craig.
https://resources.orapub.com/OraPub_Online_Seminars_About_Oracle_Database_Performance_s/100.htmYou can watch seminar introductions (like above) for free on YouTube!
If you enjoy my blog, I suspect you'll get a lot out of my online or in-class training. For details go to www.orapub.com. I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.
Categories: DBA Blogs

It's All About CPU But There Is NO CPU Bottleneck! What?

Thu, 2014-05-29 16:00
It's All About CPU But There Is NO CPU Bottleneck...
Diagnosing Oracle Database performance is like solving a puzzle. But what I really enjoy is coming up with performance solutions that are anti-intuitive for most people. You know, the ones when you can see people stop talking and actually think, "Why would Craig say that!?" In this posting, I delve into one of these situations.

The Situation
Just over a month ago I received an email from a former Australian Oracle Performance Firefighting student about their "CPU bottleneck." Based on an one-hour AWR report's Time Model Statistics section (which is based on v$sys_time_model view), the  DB CPU was 95% of DB Time, leaving 5% for Oracle wait time. The top CPU consuming SQL statements were easily identified and work had begun with tuning them. By "tuning" I mean a small group of DBAs were working to reduce their CPU consumption. But something didn't feel right to him, so he emailed me about it.

My first question was, "Is there an OS CPU bottleneck?" That is, is the host (or virtual machine if you like) out of CPU resources? I was wondering if the average CPU utilization was 80% or higher. Why was I asking this? Read on...

DB Time Math
Over an interval of time, DB Time is all the Oracle server process CPU time (i.e., DB CPU) plus all the non-idle wait time.

Note: Oracle does not include background process CPU in DB Time. Why they don't include background process CPU time is an entirely different subject. And of course, we are trusting Oracle that what they consider "non-idle" wait time should actually be "non-idle" wait time. Again, that's an entirely different subject.

My former student knew the DB Time math and therefore understood that since DB CPU was pretty close to DB Time, there was little non-idle wait time. It was "all about CPU." While I still wanted to know the wait event situation because it provides performance hindering clues and forces the DBA to review their work and think a little more.

OS CPU Utilization Math
In my Utilization On Steroids online video seminar, I demonstrate utilization math by shrinking myself down to the size of a water glass and also the size of a printer cartridge. (Like the motto says, "What Online Training Should Be!") In a nutshell, utilization can be viewed/understood as requirements divided by capacity. If a 500ml glass contains 250ml of water, the glass is 50% utilized that is, 50% full.

As I present in my online seminar, there are two simple methods to figure OS CPU utilization using only the v$osstat view. One of these methods as based on the BUSY_TIME and the IDLE_TIME statistics from the Operating System Statistics section of any AWR and Statspack report. The BUSY_TIME is the instance CPU consumption, that is, the CPU requirements. As I carefully explain in my seminar and in my classes, the capacity is actually equal to the BUSY_TIME plus the IDLE_TIME. That is:

Utilization = Requirements / Capacity

Utilization =  BUSY_TIME / ( BUSY_TIME + IDLE_TIME )

This is true unless the OS is updating OS structures incorrectly (e.g., /proc) or Oracle is not gathering the data from the OS correctly or frequently enough. And yes, I talk about how to check this in my seminar.

My former student did the utilization math. The average utilization was around 65%. Since this is the only Oracle instance doing any real work on the box, is was not a surprise to him that vmstat also indicated the OS CPU utilization was 65%.

Why I Asked About A CPU Bottleneck?
Suppose there is plenty of available CPU resources AND the SQL users are complaining about are doing nothing but buffer gets (i.e., logical reads). In other words, from a resource consumption perspective, it's all about CPU consumption. Is this possible? The answer is, YES!

If there was a single server process running this SQL statement, parallel query operations were not occurring and there are four CPU cores, what would the CPU utilization be? Probably around 25% (plus overhead between 5% to 15%). That is because on average only one CPU core was processing the SQL statement!

If this SQL statement took "too long to run" adding more CPU power (specifically more cores) would not help.

As strange at it may seem, one performance improving solution is to use the available CPU! Read on.

Use More CPU! Now!
Suppose the average CPU utilization of a 10 CORE box (or VM if you prefer) was running at an average of 50% utilization. This means that on average 5 of the 10 cores are busy. Now suppose a SQL statement takes 4 minutes to process without parallel query or very-cool (but perhaps costly) application design. How long would the SQL statement take to run if it ran in four parallel streams?

It depends. We need to ensure there is available CPU power. Assuming each parallel stream consumes one CPU core, the box has 10 cores with 5 being available (remember the average utilization is 50%), that leaves with us 5 "extra" cores. For our SQL statement to run in 4 parallel streams, we need three more CPU cores... and we have them!

If there is no parallelism overhead (ha! don't count on that!), the SQL statement would run in around 1 minutes, 8 CPU cores would be active (5+3), and the average CPU utilization would be around 80% (8 cores required / 10 cores of capacity).

So by creatively using available CPU resources, we were able to consume 4 minutes of CPU in a single minute! ...a beautiful example of parallelism. And of course, the users will be delighted and I may get a free pizza.

Be Anti-Intuitive
When a SQL statement, batch process or an Oracle process is constrained by CPU, yet there is plenty of CPU power available, the problem is probably a serialization issue. (Another possibility is some type of locking or blocking problem.) The solution can be found by parallelizing, thereby using the available CPU resources which will reduce the wall/clock/run time of the process.

If the above paragraph makes sense, then you understand what I'm talking about. If it does not make sense, stop, breath, and re-read the previous paragraph or paragraphs. It will be worth your time.

This next paragraph is really important to understand:

It is important to understand we have NOT reduced the total SQL statement's CPU requirements. We have NOT increased the total SQL statement's CPU requirements. We have NOT tuned the SQL statement in the traditional sense by reducing the buffer gets (logical reads, logical IOs). What we have done is simply altered the situation by consuming the necessary CPU requirements within a shorter duration. (In our example, the four minutes of CPU time was consumed within a single minute.) We did this taking a serial process and parallelizing it to use more of the available CPU resources.

Expand Our View Into The OS
Without understanding the OS situation we can easily misdiagnose the problem. And we can miss creative and powerful solutions.

Most DBAs immediately jump to tuning the SQL and tuning Oracle. While those are both valid technical solutions, there are others... like parallelizing to reduce run time while consuming more resources (in a shorter period of time).

Some DBAs but more likely OS Administrator but for sure hardware vendors may insist on getting more CPU power. While faster CPU will help somewhat, adding more of the same CPUs cores will likely do absolutely nothing...unless we can increase parallelism to use them.

Being able to determine the OS bottleneck is so important, I devote an entire chapter in my Oracle Performance Firefighting book on the subject. When I was learning to optimize Oracle systems, learning how to find the OS bottleneck significantly increased my value in the Oracle DBA market. Later when I ruminated on serialization and parallelization, I turned situations upside down by placing a larger (yet shorter-term) load on the system as a valid performance improving solution.

Summary
If performance is not what it needs to be AND there are available OS resources, consider using them to improve performance. It's a creative and rarely used solution.

Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my online or in-class training. Go to www.orapub.com. I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.


https://resources.orapub.com/OraPub_Online_Seminars_About_Oracle_Database_Performance_s/100.htmYou can watch seminar introductions (like above) for free on YouTube!
Categories: DBA Blogs