DBA Blogs

How To Approach Different Oracle Database Performance Problems

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.How To Approach Different Oracle Database Performance ProblemsJump Start Your Oracle Database Tuning Effort

Every Oracle Database Administrator will tell you no two performance problems are the same. But a seasoned Oracle DBA recognizes there are similarities...patterns. Fast problem pattern recognition allows us to minimize diagnosis time, so we can focus on developing amazing solutions.

I tend to group Oracle performance problems into four patterns. Quickly exploring these four patterns is what this article is all about.

You Can Not Possibly List Every Problem And Solution
When I teach, some Oracle Database Administrators want me to outline every conceivable problem along with the solution. Not only is the thought of this exhausting, it's not possible. Even my Stori product uses pattern matching. One of the keys to becoming a fantastic performance analyst is the ability quickly look at a problem and then decided which diagnosis approach is the best. For example, if you don't know the problem SQL (assuming there is one) tracing is not likely to be your best approach.

The Four Oracle Database Performance Patterns
Here are the four performance patterns I tend to group problems into.

The SQL Is Known
Many times there is a well know SQL statement that is responsible for the poor performance. While I will always do a quick Oracle Time Based Analysis (see below) and verify the accused SQL, I will directly attack this problem by tuning with SQL specific diagnostic and tuning tools.

But... I will also ask a senior application user, if the users are using the application correctly. Sometimes new applications users try and use a new application like their old application. It's like trying to drive a car with moving your feet as you are riding a bicycle... not going to work and it's dangerous!

Business Process Specific

I find that when the business is seriously affected by application performance issues, that's when the "limited budget" is suddenly not so limited. When managers and their business's are affected they want action.

When I'm approached to help solve a problem, I always ask how the business is being affected. If I keep hearing about a specific business process or application module I know two things.

First, there are many SQL statements involved. Second, the problem is bounded by a business process or application. This is when I start the diagnostic process with an Oracle Time Based Analysis approach which, will result in multiple solutions to the same problem.

As I teach in my online seminar How To Tune Oracle With An AWR Report, user feel performance through time. So, if our analysis is time based we can create a quantitative link between our analysis and their experience. If our analysis creates solutions that reduce time, then we can expect the user experience to improve. This combined with my "3 Circle" approach yields spot-on solutions very quickly.

While an Oracle Time Based Analysis is amazing, because Oracle does not instrument CPU consumption we can't answer the question, "What's Oracle doing with all that CPU?" If you want to drill into this topic check out my online seminar, Detailing Oracle CPU Consumption: The Missing Link.

It's Just Slow
How many times have I experienced this... It's Just Slow!

If what the user is attempting to explain is true, the performance issue is affecting a wide range of business processes. The problem is probably not a single issue (but could be) and clearly the key SQL is not know. Again, this is a perfect problem scenario to apply an Oracle Time Based Analysis.

The reason I say this is because an OTBA will look at the problem from multiple perspectives, categorize Oracle time and develop solutions to reduce those big categories of time. If you also do Unit Of Work Time Based Analysis, you can an even anticipate the impact of your solutions! Do an OraPub website search HERE or search my blog for UOWTBA.
Random Incident That Quickly Appears And Vanishes
This is the most difficult problem to fix. Mainly because the problem "randomly" appears and can't be duplicated. (Don't even bother calling Oracle Support to help in this situation.) Furthermore, it's too quick for an AWR report to show it's activity and you don't want to impact the production system by gathering tons of detailed performance statistics.

Even a solid Oracle Time Based Analysis will likely not help in this situation. Again, the problem is performance data collection and retention. The instrumented AWR or Statpack data does not provide enough detail. What we need step-by-step activity...like a timeline.

Because this type of problem scares both DBAs and business managers, you will likely need to answer questions like this:

  • What is that blip all about?
  • Did this impact users?
  • Has it happened before?
  • Will it happen again?
  • What should we do about it?

The only way I know how to truly diagnose a problem like this is to do a session-level time-line analysis. Thankfully, this is possible using the Oracle Active Session History data. Both v$active_session_history and dba_hist_active_sess_history are absolutely key in solving problems like this.

ASH samples Oracle Database session activity once each second (by default). This is very different than measuring how long something takes, which is the data an AWR report is based upon. Because sampling is non-continuous, a lot of detail can be collected, stored and analyzed.

A time-line type of analysis is so important, I enhanced my ASH tools in my OraPub System Monitor (OSM) toolkit to provide this type of analysis. If you want to check them out, download the OSM toolkit HERE, install it and read the osm/interactive/ash-readme.txt file.

As an example, using these tools you can construct an incident time-line like this:

HH:MM:SS.FFF User/Process Notes
------------ ------------- -----------------
15:18:28.796 suspect (837) started the massive update (see SQL below)

15:28:00.389 user (57) application hung (row lock on TM_SHEET_LINE_EXPLOR)
15:28:30.486 user (74) application hung (row lock on TM_SHEET_LINE_EXPLOR)
15:29:30.??? - row locks becomes the top wait event (16 locked users)
15:29:50.749 user (83) application hung (row lock on TM_SHEET_LINE_EXPLOR)

15:30:20.871 user (837) suspect broke out of update (implied)
15:30:20.871 user (57) application returned
15:30:20.871 user (74) application returned
15:30:20.871 user (83) application returned

15:30:30.905 smon (721) first smon action since before 15:25:00 (os thread startup)
15:30:50.974 user (837) first wait for undo - suspect broke out of update
15:30:50.974 - 225 active session, now top event (wait for a undo record)

15:33:41.636 smon (721) last PQ event (PX Deq: Test for msg)
15:33:41.636 user (837) application returned to suspect. Undo completed
15:33:51.670 smon (721) last related event (DFS lock handle)

Without ASH seemingly random problems would be a virtually impossible nightmare scenario for an Oracle DBA.

It's true. You need the right tool for the job. And the same is true when diagnosing Oracle Database performance. What I've done above is group probably 90% of the problems we face as Oracle DBAs into four categories. And each of these categories needs a special kind of tool and/or diagnosis method.

Once we recognize the problem pattern and get the best tool/method involved to diagnosis the problem, then we will know the time spent developing amazing solutions is time well spent.

Enjoy your work!


Categories: DBA Blogs

Oracle MAA 12c – A Data Plan for Meeting Every Organization’s Needs

VitalSoftTech - Mon, 2015-02-02 22:36
Data is one of the most valuable and crucial asset for any company, hence businesses spend huge amounts of money on their enterprise software to ensure its safety and availability. RAC and clustering are some examples of the ways companies practice for high availability and performance of their enterprise and databases. However there are still […]
Categories: DBA Blogs

Exadata Vulnerability

Pakistan's First Oracle Blog - Mon, 2015-02-02 19:49
This Exadata vulnerability is related to glibc vulnerability. A heap-based buffer overflow was found in glibc's __nss_hostname_digits_dots() function, which is used by the gethostbyname() and gethostbyname2() glibc function calls.

A remote attacker able to make an application call either of these functions could use this flaw to execute arbitrary code with the permissions of the user running the application.

In order to check if your Exadata system suffers from this vulnerability, use:

[root@server ~]# ./ghostest-rhn-cf.sh

The solution and action plan for this vulnerability is available by My Oracle Support in the following document:

glibc vulnerability (CVE-2015-0235) patch availability for Oracle Exadata Database Machine (Doc ID 1965525.1)
Categories: DBA Blogs

Indexing Points to Remember

Pakistan's First Oracle Blog - Mon, 2015-01-26 18:54
Indexing depends upon the queries in the application.

There is no one-size-fits-all break-even point for indexed versus table scan access. If only a few rows are being accessed, the index will be preferred.

If almost all the rows are being accessed, the full table scan will be preferred. In between these two extremes, your “mileage” will vary.

A concatenated index is more useful if it also supports queries where not all columns are specified. For instance SURNAME, FIRSTNAME is more useful than FIRSTNAME, SURNAME because queries against SURNAME only are more likely to occur than queries against FIRSTNAME only.

Global indexes provide better performance for queries that must span all partitions.
Categories: DBA Blogs

I Have Lots Of Oracle Database Server Power But Performance Is Slow/Bad

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.I Have Lots Of Oracle Database Server Power But Performance Is Slow/Bad

Oracle Database parallelism and serialization is what we as Oracle Database Administrators live and die for. You have a screaming fast Oracle Database system and there is lots of computing power available.

But performance is unacceptable; users are screaming, the phone is ringing, and those fancy dashboards are flashing like it's Christmastime.

What is going on?! What can I do about it?! That's what this post is about.

Learn By Doing

Back in December on the third day of my Oracle Performance Firefighting class, I had each student bring in an AWR report from one of their systems that was giving them problems. (I keep my classes small, giving each student time to do their analysis and time for us to talk about it as a class.)
Get this:
Half of the systems had a similar "problem." I think it's important every DBA understands this "problem" because it's more common than most people believe.

There was plenty of computing power and the key SQL statement they cared about was a batch job. What was the core problem? The quick answer is "serialization" that is, a lack of parallelism. Exploring this using a very large production system AWR report and coming up with solutions is what this posting is all about.

Serialization Is Death

In Oracle systems, serialization is death and parallelism is life. Follow this line: business, end user, application designer, DBA, Oracle Database Kernel Architects (or whatever their title is), OS Administrators, OS designers, CPU designers and IO subsystem designers all have something in common. They work hard to parallelize tasks. Just one example: Oracle is designed to have multiple background and foreground processes running parallel.

But all this parallelization effort can be wasted and minimized if a process turns into a serial work stream (at any level; Oracle, OS, business, etc.). The result is "slowness" because the wall time increases.

Available Power And Slowness Equals Opportunity

When I tune Oracle Database systems, I look for opportunities. And each of my solutions will specifically target an opportunity. When I see unused power and complaints of slowness, I look for ways to increase parallelism. Why? Because having available power combined with slowness likely means a serialization limitation exists.

For sure serialization may be necessary. Two examples come to mind; Oracle database memory serialization control (think: latch and mutex) and business rules.

But if I can find a way to increase performance by using up available power by increasing parallelism, I'll likely be able to turn a slow serialization situation into a screaming fast parallelization situation!

How To Recognize A Serially Constrained System

It's easy to recognize a serially constrained system. Ask yourself these two questions. First, is there available CPU or IO power? Second, are there complaints of application "slowness." If the answer to both of these questions is "Yes" then there is likely a serialization issue. Furthermore, the general solution is to use the available resources to our advantage. That is, find areas to increase parallelization, which will use the available resources and improve performance.

If you have the power, use it! What are you saving it for?

(There may be a very good answer to the "saving" but I'll save that for another article.)

Can I Be Out Of CPU And Be Serially Constrained?
Yes. An Oracle Database system can be serially constrained and be out of OS resources. A great example of this is when there is a raging Oracle memory serialization issue. If you see both significant Oracle latching or mutex wait time combined with a raging CPU bottleneck, you likely have a serialization issue... an Oracle Database memory structure access serialization issue.

So, while available power on a "slow" system likely means we have a serially constrained system there are situations in Oracle with a raging CPU bottleneck that also means there is likely a serialization issue.

Find Out: Is There Available CPU Power?

Here Is A Real Life Situation. To simplify, I'm going to focus on only instance number one. Look at instance number one in the below picture.

The above AWR report snippet shows RAC node #1 OS CPU utilization at 15%. This means that over the AWR report snapshot interval, the average CPU utilization was 15%. I never initially trust an AWR report for calculated results. Plus it's good practice to do the math yourself. If you use the super fast busy-idle method I have outlined in THIS POST and detailed in my online seminar, Utilization On Steroids, the utilization calculates to 16% ( 0.5/(0.5+2.7)=0.16 ). So the AWR Report's 15% for CPU "% Busy" looks to be correct.

Clearly with an average CPU utilization of 15%, we have an opportunity to use the unused CPU power to our advantage.

Find Out: Is There Available IO Power?

I am looking for fast IO responsiveness. That is, a low response time. A great way to get a quick view of IO subsystem responsiveness is to look at the average wait time for the event, db file sequential read.

The wait event, db file sequential read is the time it takes to read a single block synchronously. I like to call it a pure IO read call: a) what time is it? b) make the IO call and wait until you get it, c) what time is it? d) calculate the delta and you have the wait time...and the IO read call response time! If you want more details, I wrote about this HERE, which includes a short video.

For our system, let's figure out the single block IO subsystem read response time. Using the same AWR report, here is a screen shot of the Top Time Events.

Again, I'm just going to focus on the first instance. If you look closely (middle right area), you'll see for instance number one, the average db file sequential read time wait time is 2.22ms. That's fast!

There is no way a physical spinning disk is going to return a block in 2.22ms. This means that many of Oracle's single block read calls are be satisfied through some non-Oracle cache. Perhaps an OS cache or an IO subsystem cache. We can't tell, but we do know the block was NOT an Oracle's buffer cache because the db file sequential wait means the block was not found in Oracle's buffer cache.

A single block synchronous IO read call with an average of 2.22ms means there is available IO read capacity and probably available write capacity as well. Again, just like with the OS CPU subsystem, we have unused power that we will try and use to our advantage.

At this point, I will assume there is also plenty of memory and network capacity available. So, the bottom line is we have a "slow" system combined with available CPU and available IO power. Wow! That is a great situation to be in. I call this, "low hanging fruit."

Real Life: Looking For The "Slow" SQL

At the top of this post, I mentioned that in my Firefighting class in each of the "serialization" cases, there was a key SQL statement that was part of a larger batch process. Keep in mind, that at this point in the analysis I did NOT know this. All I knew was that users were complaining and there was plenty of CPU and IO resources.

Usually, in this situation there is a relatively long running process. There could be lots of quick SQL statement involved, but usually this is not the case. And I'm hoping there is a key long running SQL statement that can be parallelized.

Long running can roughly be translated into "high elapsed time." I've have written a number of articles about elapsed time (search my blog for: elapsed time) and even have a free tool with which, you can gather to get more than simply the average elapsed time. And I have online seminars that touch on this subject: Tuning Oracle Using An AWR Report and also, Using Skewed Performance Data To Your Advantage. So there are lots of useful resources on this topic.

In the AWR report, I'm going to look closely at the SQL Statistics, in particular the "SQL ordered by Elapsed Time (Global)." What I really want is the statistics only for instance one, that is, not global. But that's all I have available. Plus the DBAs will/should know if the key SQL statement(s) are run on instance one. Here's the report.

In the report above, look at the elapsed times (second column on the left). Now looking right, find the "Execs", that is, the executions column. The execution column is the number of completed executions within this AWR snapshot range. If the executions is zero, this means the SQL did not complete during the snapshot interval, that is before the ending snapshot.

If you're wondering, these top elapsed time SQL statements are involved in batch processing. When I look at this, I see opportunity, fruit waiting to be harvested!

And I love this: Every DBA in the class in this situation said, "Oh! I know about this SQL. It's always causing problems." Now it's time to do something about it!

Real Life: Putting This All Together

We have identified available CPU and IO capacity. And we have identified THE elapsed time SQL statement. While I'm a pretty laid back kind of guy, at this point I start to apply some pressure. Why? Because the users are complaining, we have identified both an opportunity, the cause of the problem and the general solutions.

There are two general solutions:

1. Do less work. You want to empty a candy dish faster? Then start with less candy in the dish! If you want a SQL statement to run faster, tune the SQL so it touches less blocks.

2. Do the same amount of work, but group the work and run each group at the same time. This is parallelization! This is why the total elapsed time will not decrease (it will probably increase a little) but the wall time will likely decrease... and dramatically! Here is a LINK to posting that contains a short video demonstrating the difference between elapsed time and wall time.

How To Parallelize (in summary)

There are many different ways to parallelize. But the goal is the same: use the available resources to reduce wall time (not necessarily the elapsed time). Perhaps the application can be redesigned to run in parallel streams. But that can take a very long time and be a real hassle. But in many cases, it's the best long term solution.

If you are short on time, are licensed for Oracle Parallel Query and the SQL has been optimized (oh boy... how many times have all heard that before), you likely can use Oracle PQ. And of course, even if the SQL is not optimized, you can still run PQ and performance may be fantastic.

By the way, adding faster IO disks or more IO disks (what is a "disk" is nowadays anyways) will likely NOT work. Remember the IO subsystem is performing wonderfully.

Thanks for reading and enjoy the mystery of your work!

Categories: DBA Blogs

Log Buffer #406, A Carnival of the Vanities for DBAs

Pakistan's First Oracle Blog - Mon, 2015-01-19 01:38
This Log Buffer Edition covers blog posts from various bloggers of Oracle, SQL Server and MySQL.


Sync tables: generate MERGE using Unique constraint.
What Hardware and Software Do YOU Want Oracle to Build?
There were a number of new features introduced in Ops Center 12.2.2. One of the shiny ones is an expansion of the backup and recovery capabilities to include Proxy Controllers.
Want to Be a Better Leader? Answer One Question.
Managing a remote Oracle Database instance with “Geographic Edition”.

SQL Server:

Learn how you can use SQLCop to prevent your developers from writing stored procedures that are named sp_ something.
Data Cleaning in SQL 2012 with Data Quality Services.
Stairway to PowerPivot and DAX - Level 9: Function / Iterator Function Pairs: The DAX MAX() and MAXX() Functions.
Options to Improve SQL Server Bulk Load Performance.
Dynamically Create Tables Based on an Access Table


Stored Procedures: critiques and defences.
JSON UDF functions 0.3.3 have been released.
Business Scalability, Operational Efficiency and Competitive Edge with MariaDB MaxScale 1.0 GA.
MySQL 5.7 labs and the HTTP Plugin – inserting, updating and deleting records in MySQL via HTTP.
Hyper-threading – how does it double CPU throughput?

Published on Pythian Blog
Categories: DBA Blogs

Do The LGWRs Always Sleep For The Full Three Seconds?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Do Oracle Database LGWRs (10g, 11g, 12c) Always Sleep For The Full Three Seconds?
Back in June I wrote (included a video) about the Oracle Database log writer(s) "3 second sleep rule." That's the rule we were all taught by our instructors when we started learning about Oracle yet never really knew if it was true. In that post, I demonstrated Oracle Database log writer background processes are normally put to sleep for three seconds.

In this post, I want to answer a related but different question.

Do Oracle Database log writer background processes ALWAYS sleep for the full three seconds? Our initial response would likely be, "Of course not! Because what if a foreground process commits during the three second sleep? The log writer(s) must wake up." That would make sense.

But, is this really true and what else could we learn by digging into this? I created an experiment to check this out, and that is what this post is all about.

The Experiment
In my June post I demonstrated the Three Second Rule. You will see this again below. But in this experiment we are looking for a situation when one of the 12c log writers wakes BEFORE their three second sleep.

You can download the experimental script I detail below HERE.

This is really tricky to demonstrate because of all the processes involved. There is a the Oracle foreground process and in 12c, there are multiple log writer background processes. Because this is experiment follows a timeline, I needed to gather the process activity data and then somehow merge it all together in a way that we humans can understand.

What I did was to do an operating system trace ( strace ) each process ( strace -p $lgwr )  with the timestamp option ( strace -p $lgwr -tt ) sending each process's the output to a separate file ( strace -p $lgwr -tt -o lgwr.txt ). This was done to all four processes and of course, I needed to start the scripts to run in the background. Shown directly below are the log writer strace details.

lgwr=`ps -eaf | grep $sid | grep lgwr | awk '{print $2}'`
lg00=`ps -eaf | grep $sid | grep lg00 | awk '{print $2}'`
lg01=`ps -eaf | grep $sid | grep lg01 | awk '{print $2}'`

echo "lgwr=$lgwr lg00=$lg00 lg01=$lg01"

strace -p $lgwr -tt -o lgwr.str &
strace -p $lg00 -tt -o lg00.str &
strace -p $lg01 -tt -o lg01.str &

Once the log writers were being traced, I connected to sqlplus and launched the below text in the background as well.

drop table bogus;
create table bogus as select * from dba_objects where object_id in (83395,176271,176279,176280);
select * from bogus;
exec dbms_lock.sleep(2.1);

exec dbms_lock.sleep(2.2);
exec dbms_lock.sleep(2.3);
update bogus set object_name='83395' where object_id=83395;
exec dbms_lock.sleep(3.1);
update bogus set object_name='176271' where object_id=176271;
exec dbms_lock.sleep(3.2);
update bogus set object_name='176279' where object_id=176279;
exec dbms_lock.sleep(3.3);
update bogus set object_name='176280' where object_id=176280;
exec dbms_lock.sleep(3.4);
exec dbms_lock.sleep(3.5);
update bogus set object_name='89567' where object_id=89567;
exec dbms_lock.sleep(3.6);
exec dbms_lock.sleep(3.7);

Once the sqlplus session was connected,

sqlplus system/manager @/tmp/runit.bogus &
sleep 2

I grabbed it's OS process id and started an OS trace on it as well:

svpr=`ps -eaf | grep -v grep | grep oracle$sid | awk '{print $2}' `
echo "svpr=$svpr"

strace -p $svpr -tt -o svpr.str &

Then I slept for 30 seconds, killed the tracing processes (not the log writers!):

sleep 30

for pid in `ps -eaf | grep -v grep | grep strace | awk '{print $2}'`
echo "killing pid $pid"
kill -2 $pid

Then I merged the trace files, sorted them by time, got rid of stuff in the trace files I didn't want to see and put the results into a final "clean" file.

rm -f $merge
for fn in lgwr lg00 lg01 svpr
cat ${fn}.str | awk -v FN=$fn '{print $1 " " FN " " $2 " " $3 " " $4 " " $5 " " $6 " " $7 " " $8 " " $9}' >> $merge

ls -ltr $merge
cat $merge | sort > /tmp/final.bogus

cat /tmp/final.bogus | grep -v times | grep -v getrusage | grep -v "svpr lseek" | grep -v clock_gettime | grep -v gettimeofday | grep -v "svpr read" | grep -v "svpr write" > /tmp/final.bogus.clean

The amazing thing is... this actually worked! Here is the output below:

19:11:41.981934 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {2, 200000000}) =
19:11:42.859905 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:43.986421 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:44.186404 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {2, 300000000}) =
19:11:44.982768 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:45.860871 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:46.499014 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 100000000}) =
19:11:46.989885 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:47.983782 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:48.861837 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:49.608154 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 200000000}) =
19:11:49.993520 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:50.984737 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:51.862921 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:52.817751 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 300000000}) =
19:11:52.997116 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:53.985784 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:54.863809 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:55.998974 lgwr open("/proc/41955/stat", O_RDONLY) = 19
19:11:55.999029 lgwr read(19, "41955 (ora_pmon_prod35) S 1 4195"..., 999) =
19:11:55.999075 lgwr close(19) = 0
19:11:55.999746 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:56.127326 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 400000000}) =
19:11:56.986935 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:57.864930 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:59.003212 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:59.531161 svpr semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.531544 lgwr semctl(7503875, 18, SETVAL, 0x7fff00000001) = 0
19:11:59.532204 lg00 pwrite(256, "\1\"\0\0\311\21\0\0\354\277\0\0\20\200{\356\220\6\0\0\r\0\0\0\367^K\5\1\0\0\0"..., 2048, 2331136) = 2048
19:11:59.532317 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 480000000}) =
19:11:59.532680 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {0, 100000000}) =
19:11:59.537202 lg00 semctl(7503875, 34, SETVAL, 0x7fff00000001) = 0
19:11:59.537263 lg00 semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.537350 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:59.538483 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 470000000}) =
19:11:59.540574 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 500000000}) =
19:12:00.865928 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:02.011876 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:02.537887 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:03.050381 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 600000000}) =
19:12:03.866796 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:05.014819 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:05.538797 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:06.657075 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 700000000}) =
19:12:06.867922 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:08.017814 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:08.539750 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:09.868825 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}

There is a lot of detail in the above output. I'm only going to make a few comments that pertain to the objectives of this post.

Oracle is using the semaphore call semtimedop to sleep. The beauty of this call, is it allow the process to be woken (that is, signaled) by another process! Keep that mind as you follow the timeline.

Here we go:

19:11:41.981934. Notice the server process' "2, 2" and later the "2,3" and "3, 1" and "3, 2"? This is the result of the dbms_lock.sleep commands contained in the sqlplus script!

19:11:42.859905. Notice lg01 and the other log writer background processes always have a "3, 0" semtimedop call? That is their "3 second sleep."

Look at the first few lgwr entries. I've listed them here:


Notice anything strange about the above times? They are all just about 3 seconds apart of from each other. That's the 3 second sleep in action. But that's not the focus of this post. So let's move on.

Read this slow: I want to focus on just one part of the output which, is shown below. Notice the server process is sleeping for 3.4 seconds. If you look at the sqlplus script (near the top of this post), immediately after the 3.4 second sleep the server process issues a commit. Therefore, because the 3.4 sleep starts at 19:11:56.1 and I'm expecting to see some log writer activity in 3.4 seconds. This would be at This could occur in the middle of the log writer 3 second sleep, which means we will likely see a log writer kick into action before their 3 second sleep completes! Let's take a look.

19:11:56.127326 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 400000000}) =
19:11:56.986935 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:57.864930 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:59.003212 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:59.531161 svpr semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.531544 lgwr semctl(7503875, 18, SETVAL, 0x7fff00000001) = 0
19:11:59.532204 lg00 pwrite(256, "\1\"\0\0\311\21\0\0\354\277\0\0\20\200{\356\220\6\0\0\r\0\0\0\367^K\5\1\0\0\0"..., 2048, 2331136) = 2048
19:11:59.532317 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 480000000})

We can see the server process 3.4 second sleep starting at time 19:11:56.1 and we can see the sleep end and the server process' next command begin at the expected time of 19:11:59.5. Next in the trace file output is result of the commit. The commit results in the wake of both the lgwr and lg00 background processes.

But notice the lgwr background process started one of its 3 second sleeps at 19:11:59.0 which means it doesn't want to wake until 19:12:02.0. But look at when the lgwr process woke up. It woke up at which is clearly before the expected time of 19:12:02.0. What you just noticed was the lgwr background process was signaled to wake up before its three second sleep completed.

But why did the lgwr need to be woken up? Because the server process' redo must be immediately written.

But it gets even better because the lgwr background process doesn't do the redo write! The lgwr process signals the lg00 process to do the write, which we can see occurs at time 19:11:59:5. Wow. Amazing!

What We Can Learn From This
Personally, I love these kinds of postings because we can see Oracle in action and demonstrating what we believe to be true. So what does all this actually demonstrate? Here's a list:

  1. We can see the 12c log writers involved. Not only lgwr.
  2. All log writer background process initiate a sleep for the default three seconds. I have seen situations where it is not three seconds, but it appears the default is three seconds.
  3. The server process signals the lgwr process to write immediately after a commit is issued.
  4. The server process signals the lgwr process to write using a semaphore.
  5. The log writers (starting in 12c) can signal each other using semaphores. We saw lgwr signal the lg00 background process to write.
  6. The server process was performing updates over 10+ a second period, yet its redo was not written to disk until it committed. This demonstrates that ALL redo is not flushed every three seconds. (This is probably not what you learned... unless you joined one of my Oracle Performance Firefighting classes.)
  7. The log writers while normally put to sleep for three seconds, can be woken in the middle for an urgent task (like writing committed data to an online redo log).

I hope you enjoyed this post!

Thanks for reading,

Categories: DBA Blogs

Oracle ZFS Network Attached Storage ZS4-4 System for Oracle Database 12c

VitalSoftTech - Sun, 2015-01-11 19:02
Oracle has recently announced its latest generation of Network-attached Storage or NAS system namely the Oracle ZFS Storage ZS4-4. With the new Oracle Database 12c in place, ZS4-4 is capable of providing some brand new exciting features, which in the words of Oracle VP, Mr. John Fowler expand Oracle’s leadership in NAS performance. New Capabilities […]
Categories: DBA Blogs

Histograms Tidbits

Pakistan's First Oracle Blog - Fri, 2015-01-09 03:00
Make sure histograms exist on columns with uneven data distributions to ensure that the optimizer makes the best choice between indexes and table scans.

For range scans on data that is not uniformly distributed, the optimizers’ decisions will be improved by the presence of a histogram.

Histograms increase the accuracy of the optimizer’s cost calculations but increase the overhead of statistics collections. It’s usually worth creating histograms for columns where you believe the data will have an irregular distribution, and where the column is involved in WHERE or JOIN expressions.


The METHOD_OPT option controls how column level statistics, in particular histograms, are created. The default value is ‘FOR ALL COLUMNS SIZE AUTO’,

which enables Oracle to choose the columns that will have a histogram collected and set the appropriate histogram bucket size.
Categories: DBA Blogs

Returning Error Codes from sqlplus to Shell Scripts

Don Seiler - Tue, 2015-01-06 15:32
When I have to run SQL scripts, I prefer to do it via executable ksh scripts (I use ksh because it's usually an Oracle pre-requisite so I can rely on it being installed). One thing that I've just recently started adding into my scripts is exiting sqlplus on error and returning the ORA error code. This is done via the WHENEVER SQLERROR feature of sqlplus:

sqlplus / as sysdba <<EOF

        whenever sqlerror exit sql.sqlcode
        alter tablespace foo
                rename to foo_old;

        create tablespace foo
                datafile size 100m;

        alter table foo move tablespace $TABLESPACE_NAME nocompress;

if [ $RETURN_CODE -ne 0 ]; then
        echo "*** Tablespace renaming error code $RETURN_CODE. ***"
        exit $RETURN_CODE;

In this example we rename a tablespace and then create a new tablespace in its place and move a table there. It seems rather contrived but this is actually what I'm doing to move tables to an uncompressed and unencrypted tablespace to test storage vendor deduplication claims. But I digress ...

To test this, I used a tablespace that doesn't exist. This results in an ORA-00959 error when the tablespace does not exist. However, the return code I get was 191. I spent a good portion of the day testing and retesting with all sorts of debug output until I stumbled across this comment. Since Linux (and I'm told Unix) return codes only go up to 255, the ORA sqlcode value of 959 is wrapped until there is a remainder. The value of 191 is the difference, achieved simply by modulo operation:

959 % 256 = 191

 And suddenly the skies were cleared.

As always, hope this helps!
Categories: DBA Blogs

Who is a DBA Leader?

Pakistan's First Oracle Blog - Tue, 2015-01-06 06:00
Sitting behind a big mahogany table, smoking Cuban Cigar, glaring at the person sitting across, one hand taking the receive of black phone to right ear, and the other hand getting the mobile phone off the left ear can be the image of a DBA boss in any white elephant government outfit, but it certainly cannot work in organization made up of professionals like database administrators. And if such image or similar image is working in any such company then that company is not great. It's as simple as that.

So who is DBA leader? The obvious answer is the person who leads a team of database administrators. Sounds simple enough, but it takes a lot to be a true leader. There are many DBA bosses at various layers, DBA managers at various layers, but being a DBA leader is very different. If you are a DBA leader, then you should be kinda worshiped. If you work in a team which has a DBA leader, then you are a very lucky person.

A DBA leader is the one who leads by an example. He walks the talk. He is the doer and not just talker. He inspires, motivates, and energizes the team members to follow him and then exceed his example. For instance, when client asks to improve that performance issue in the RAC cluster, the DBA leader would first jump in at the problem and start collaborating with team. He would analyze the problem, would present his potential solutions or at least line of action. He would engage the team and would listen to them. He won't just assing the problem to somebody, then disappear, and come back at 5pm asking about status. DBA leader is not super human, so he will get problems of which he won't have any knowledge. He will research the the problem with team and will learn and grow with them. That approach would electrify the team.

A DBA leader is a grateful person. He doesn't seem to thank his team enough for doing a great job. When under the able leadership of the DBA leader, team would reach to a solution, then regardless of his contribution, a DBA leader would make his team look awesome. That will generate immense prestige for the DBA leader at the same time, while making team looking great. Team would cherish the fact that solution was reached after deep insights of the DBA leader, and yet leader gave credit to them.

A DBA leader is the one who is always there. He falls before the team falls, and doesn't become aloof when things don't go well. Things will go wrong and crisis will come. In such situations, responsibility is shared and DBA leader doesn't shirk from it. In the team of DBA leader, there are no scapegoats.

A leader of DBAs keeps both big piture and ther details in perspective at the same time. He provides the vision and lives the vision from the front. He learns and then he leads. He does all of this and does it superbly and that is why he is the star and such a rare commodity, and that is why he he is the DBA LEADER.

Categories: DBA Blogs

MySQL Locking

Kubilay Çilkara - Tue, 2015-01-06 03:53
MySQL and Row Level Locking? Or why are you getting the error:

ERROR 1205 (HY000) : Lock wait timeout exceeded; try restarting transaction

You get the error because your allocated time to hold a DML lock in a transaction exceeds the set limit. Usually the default limit to hold a DML row lock, set by innodb_lock_wait_timeout db parameter, is 50 seconds. If your transaction doesn't commit/rollback within 50 seconds you will get this error. We don't want to hold locks for longer than 50 seconds anyway, throughput would be affected.

And yes MySQL in innodb uses row level locking. Since MySQL 5.1+ (time Oracle took over) it does row level locking in its InnoDB tables. That means only the rows which are selected . . . FOR UPDATE . . . are locked and not the  whole table. To see the threads (sessions) which are locking other threads and which queries are locking, use the following INFORMATION_SCHEMA dictionary SQL query as DBA.  You will be able to see blockers and waiters of transactions  waiting on locks. Run it as is using INFORMATION_SCHEMA schema, no modifications.

Use this SQL query to monitor locks and transactions and note that query will return data only when there are locks!

    r.trx_id AS wating_trx_id,
    r.trx_mysql_thread_id AS waiting_thread,
        CURRENT_TIMESTAMP) AS wait_time,
    r.trx_query AS waiting_query,
    l.lock_table AS waiting_table_lock,
    b.trx_id AS blocking_trx_id,
    b.trx_mysql_thread_id AS blocking_thread,
        INSTR(p.host, ':') - 1) AS blocking_host,
        INSTR(p.host, ':') + 1) AS blocking_port,
    IF(p.command = 'Sleep', p.time, 0) AS idle_in_trx,
    b.trx_query AS blocking_query
    information_schema.innodb_lock_waits AS w
        INNER JOIN
    information_schema.innodb_trx AS b ON b.trx_id = w.blocking_trx_id
        INNER JOIN
    information_schema.innodb_trx AS r ON r.trx_id = w.requesting_trx_id
        INNER JOIN
    information_schema.innodb_locks AS l ON w.requested_lock_id - l.lock_id
        LEFT JOIN
    information_schema.PROCESSLIST AS p ON p.id = b.trx_mysql_thread_id
ORDER BY wait_time DESC;

Categories: DBA Blogs

Script for Exadata I/O Report

Pakistan's First Oracle Blog - Mon, 2015-01-05 00:02
select function_name,sum(SMALL_READ_MEGABYTES)/1024 SM_Read_GB,
sum(LARGE_READ_REQS) LG_Read_Requests,
sum(LARGE_Write_REQS) LG_Write_Requests
from v$iostat_function_detail
group by function_name;
Categories: DBA Blogs

Log Buffer #404, A Carnival of the Vanities for DBAs

Pakistan's First Oracle Blog - Mon, 2015-01-05 00:01

With new year already in fast gear, bloggers are sparing no stone unturned to come up with innovative ideas. This Log Buffer edition is keeping pace with them as always.

While playing with 12c Scott tried the upgrade to the DEFAULT column syntax that now allows sequences.
This is an age old question and of course the answer depends on how you say “SQL”.
Happy New Year! Upgraded Grid Infrastructure to and applied the Oct 2014 PSU. Had an error during rootupgrade.sh as well, due to the ASM spfile being on disk instead of on ASM diskgroup.
If you (already) created your first Oracle Service Bus 12c application/project with SOAP webservices and tried to deploy it to your IntegratedWeblogic server you might be familiar with this error.
Using Drag-Drop functionality in af:treeTable to move data between nodes.

SQL Server:

Hadoop has been making a lot of noise in the Big Data world.
Lets look at two different ways of creating an HDInsight Cluster: Creating an HDInsight Cluster through Azure Management Portal, and creating an HDInsight Cluster through Windows Azure PowerShell.
Why you need test driven development.
SQL Server Data Import System to Alert For Missed Imports.
Create stunning visualizations with Power View in 20 minutes or less!


So assume you just uploaded the certificate you use to identify yourself to the MySQL server to Github or some other place it doesn’t belong…and there is no undelete.
MySQL Plugin for Oracle Enterprise Manager on VirtualBox: installation gotchas.
MariaDB slave restore using GTID & xtrabackup bug.
How small changes impact complex systems – MySQL example.
In this post, Louis talk about MHA GTID behavior, we test different cases and find something is different from previous versions.

Also Published at Pythian Blog.
Categories: DBA Blogs

Can A Background Process Impact A Foreground Process And Its Database Time?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Can A Background Process Impact A Foreground Process And Its Database Time?
Have you ever heard someone say, "Background processes do not impact foreground processes because they run in the background and in parallel with foreground processes." I've heard this hundreds of times!

While doing some performance research I came across a great example of how an Oracle Database background process can directly and significantly impact a foreground process.

The above quote represents a masterfully constructed lie; it contains both a lie and a truth. The mix of a truth and a lie make understanding the reality of the situation difficult. In this post, I'll explain the truth, delve into the lie and relate it all to foreground process database time.

By the way, I am in no way saying there is something wrong with or incorrect about DB Time. I want to ensure this is clear from the very beginning of this post.

Just so there is no confusion, an Oracle foreground process is sometimes also called a server process or a shadow process. These can terms can be used interchangeably in this post.

The Truth
Clearly background and foreground processes operate in parallel. I don't think any DBA would deny this. As I frequently say, "serialization is death and parallelism is life!" A simple "ps" command will visually show both Oracle background and foreground processes at work. But this in no way implies they do not impact each other's activity and performance.

In fact, we hope they do impact each other! Can you imagine what performance would be with the background processes NOT running in parallel?! What a performance nightmare that would be. But this where the "no impact" lie lives.

The Lie
Most senior DBAs can point to a specific situation where Oracle cache buffer chain latch contention affected multiple foreground sessions. In this situation, foreground sessions were franticly trying to acquire a popular cache buffer chain latch. But this is a foreground session versus foreground session situation. While this is example is important, this post is about when a background process impacts a foreground process.

Have you every committed a transaction and it hangs while the foreground process is waiting on "log file switch (checkpoint incomplete)" or even worse "log file switch (archiving needed)" event? All the foreground process knows is that its statement can't finish because a required log switch has not occurred because a checkpoint is incomplete. What the server process does not know is the checkpoint (CKPT), the database writer (DBWR) and the log writer (LGWR) background processes are involved. There is a good chance the database writer is frantically writing dirty buffers to the database (dbf) files so the LGWR can safely overwrite the associated redo in the next online redo log.

For example, if a server process issued a commit during the checkpoint, it will wait until the checkpoint is complete and the log writer has switched and can write into the next redo log. So, while the log writer background processes is probably waiting on "log file parallel write" and the database writer is burning CPU and waiting on "db file parallel write", the foreground processes are effectively hung.

This is a classic example of how a background process can impact the performance of a foreground process.

A Demonstration Of The Lie
Here's a quick demonstration of the above situation. On an existing database in my lab, I created two 4MB redo logs and dropped all the other redo logs. I started a DML intensive workload. According to the alert.log file, the redo logs where switching every couple of seconds! Take a look at this:
$ tail -f /home/oracle/base/diag/rdbms/prod30/prod30/trace/alert*log
Thread 1 cannot allocate new log, sequence 2365
Checkpoint not complete
Current log# 4 seq# 2364 mem# 0: /home/oradata/prod30/redoA1.log
Mon Dec 29 11:02:09 2014
Thread 1 advanced to log sequence 2365 (LGWR switch)
Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2366
Checkpoint not complete
Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 advanced to log sequence 2366 (LGWR switch)
Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 cannot allocate new log, sequence 2367
Checkpoint not complete
Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 advanced to log sequence 2367 (LGWR switch)
Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2368
Checkpoint not complete
Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Mon Dec 29 11:02:20 2014

Obviously not what you want to see on a production Oracle system! (But my guess many of you have.)

Using my OSM realtime session sampler tool (rss.sql - related blog posting HERE) I sampled the log writer every half a second. (There is only one log writer background process because this is an Oracle 11g database, not an Oracle Database 12c system.) If the log writer session showed up in v$session as an active session, it would be picked up by rss.sql.  Both "ON CPU" and "WAIT" states are collected. Here is a sample of the output.

It's very obvious the log writer is doing some writing. But we can't tell from the above output if the process is impacting other sessions. It would have also been very interesting to sample the database writer also, but I didn't do that. To determine if the background processes are impacting other sessions, I needed to find a foreground session that was doing some commits. I noticed that session 133, a foreground process was busy doing some DML and committing as it processed its work. Just as with the log writer background process, I sampled this foreground process once every 0.5 second. Here's a sample of the output.

Wow. The foreground process is waiting a lot for the current checkpoint to be completed! So... this means the foreground process is being effectively halted until the background processes involved with the checkpoint have finished their work.

This is a great example of how Oracle background processes can impact the performance of an Oracle foreground process.

But let's be clear. Without the background processes, performance would be even worse. Why? Because all that work done in parallel and in the background would have to be done by each foreground process AND all that work would have to be closely controlled and coordinated. And that, would be a performance nightmare!

DB Time Impact On The Foreground Process

Just for the fun of it, I wrote a script to investigate DB Time, CPU consumption, non-idle wait time and the wait time for the "log file switch wait (checkpoint incomplete)" wait event for the foreground process mentioned above (session 133). The script simply gathers some session details, sleeps for 120 seconds, again gathers some session details, calculates the differences and displays the results. You can download the script HERE. Below is the output for the foreground process, session 133.
SQL> @ckpttest.sql 133

Table dropped.

Table created.

PL/SQL procedure successfully completed.

----------- ----------- --------------- -----------------
2.362 117.71 119.973692 112.42

1 row selected.

Here is a quick description of the output columns.

  • CPU_S_DELTA is the CPU seconds consumed by session 133, which is the time model statistic DB CPU.
  • NIW_S_DELTA is the non-idle wait time for session 133, in seconds.
  • DB_TIME_S_DELTA is the DB Time statistic for session 133, which is the time model statistic DB Time.
  • CHECK_IMPL_WAIT_S is the wait time only for event "log file switch (checkpoint incomplete)" for session 133, in seconds.

Does the time fit together as we expect? The "log file switch..." wait time is part of the non-idle wait time. The DB Time total is very close to the CPU time plus the non-idle wait time. Everything seems to add up nicely.

To summarize: Oracle background processes directly impacted the database time for a foreground process.

In Conclusion...
First, for sure Oracle foreground and background processes impact each other...by design for increased performance. Sometimes on real production Oracle Database systems things get messy and work that we hoped would be done in parallel must become momentarily serialized. The log file switch example above, is an example of this.

Second, the next time someone tells you that an Oracle background process does not impact the performance of a foreground process, ask them if they have experienced a "log file switch checkpoint incomplete" situation. Pause until they say, "Yes." Then just look at them and don't say a word. After a few seconds you may see a "oh... I get it." look on their face. But if not, simply point them to this post.

Thanks for reading and enjoy your work!


Categories: DBA Blogs

Packt - The $5 eBook Bonanza is here!

Surachart Opun - Tue, 2014-12-23 00:38
 The $5 eBook Bonanza is here!
Spread out news for people who are interested in reading IT books. The $5 eBook Bonanza is here! You will be able to get any Packt eBook or Video for just $5 until January 6th 2015.
Categories: DBA Blogs

Is Oracle Database Time Correct? Something Is Not Quite Right.

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.Is Oracle Database Time Correct? Something Is Not Quite Right.

Oracle Database tuning and performance analysis is usually based on time. As I blogged HERE, the Oracle "database time" statistic is more interesting than simply "time spent in the database." It is the sum of CPU consumption and non-idle wait time. And Elapsed Time is the sum of all the database time related to perhaps a session or a SQL statement execution. However...

If you do the fundamental math, you'll notice the numbers don't always add up. In fact, they rarely match. In this posting, I want to demonstrate this mismatch and I want you to see this on your systems!

I'll include experimental data from a number of different systems, the statistical analysis (including pictures) and provide a tool you can download for free from OraPub.com to check out the situation on your systems.

Checking DB Time Math

DB Time is defined as "time spent in the database." This is the sum of Oracle process CPU consumption plus non-idle wait time. Usually we don't derive DB Time. The Time Model Statistics view v$sys_time_mode contains the DB Time statistic. But this view also contains the DB CPU statistic. Since there is no sum of non-idle wait time, most people just assume everything is fine.

However, if you run the simple query below on your system, it could look something this:
SQL> l
1 select db_time_s, db_cpu_s, tot_ni_wt_s
2 from (select value/1000000 db_time_s from v$sys_time_model where stat_name = 'DB time' ),
3 (select value/1000000 db_cpu_s from v$sys_time_model where stat_name = 'DB CPU' ),
4* (select sum(TIME_WAITED_MICRO_FG)/1000000 tot_ni_wt_s from v$system_event where wait_class != 'Idle' )
SQL> /

---------- ---------- -----------
330165.527 231403.925 119942.952

1 row selected.
If you add up the DB CPU and the total non-idle wait time, the value is 351,346.877. Woops! 351K does not equal 330K. What happened on my Oracle Database 12c ( As I have demonstrated in this POSTING (which contains videos of this) and in my online seminar training HERE, many times DB Time does nearly equal DB CPU plus the non-idle wait time. But clearly in the above situation something does not seem quite right.

Checking DB Time On Your Systems
To demonstrate the possibility of a DB Time mismatch, I created a simple plsql tool. You can download this free tool or do an OraPub.com search for "db time tool". The tool, which is easily configurable, takes a number of samples over a period of time and displays the output.

Here is an example of the output.

OraPub DB Time Test v1a 26-Sep-2014. Enjoy but use at your own risk.
Starting to collect 11 180 second samples now...
All displayed times are in seconds.
anonymer Block abgeschlossen
... RAW OUTPUT (keep the output for your records and analysis)
sample#, db_time_delta_v , db_cpu_delta_v, tot_ni_wait_delta_v, derived_db_time_delta_v, diff_v, diff_pct_v
1, 128,4, 128,254, ,103, 128,357266, ,043, 0
2, 22,014, 3,883, 17,731, 21,614215, ,399, 1,8
3, 1,625, 1,251, ,003, 1,253703, ,371, 22,8
4, 13,967, 12,719, 1,476, 14,194999, -,228, -1,6
5, 41,086, 41,259, ,228, 41,486482, -,4, -1
6, 36,872, 36,466, ,127, 36,593884, ,278, ,8
7, 38,545, 38,71, ,137, 38,847459, -,303, -,8
8, 37,264, 37,341, ,122, 37,463525, -,199, -,5
9, 22,818, 22,866, ,102, 22,967141, -,149, -,7
10, 30,985, 30,614, ,109, 30,723831, ,261, ,8
11, 5,795, 5,445, ,513, 5,958586, -,164, -2,8
The test is complete.
All displayed times are in seconds.

The output is formatted to make it easy to statistically analyze. The far right column is percent difference between the reported DB Time and the calculated DB Time. In the above example, they are pretty close. Get the tool and try it out on your systems.

Some Actual Examples
I want to quickly show you four examples from a variety of systems. You can download all the data in the "analysis pack" HERE. The data, for each of the four systems, contains the raw DB Time Test output (like in the section above), the statistical numeric analysis output from the statistical package "R", the actual "R" script and the visual analysis using "smooth histograms" also created using "R."

Below is the statistical numeric summary:

About the columns: Only the "craig" system is mine and other are real production or DEV/QA systems. The statistical columns all reference the far right column of the DB Time Test Tool's output, which is the percent difference between the reported DB Time and the calculated DB Time. Each sample set consists of eleven 180 second samples. The P-Value greater than 0.05 means the reported and calculated DB Time differences are normally distributed. This is not important in this analysis, but gives me clues if there is a problem with the data collection.

As you can easily see, two of the system's "DB Times" difference is greater than 10% and one of them was over 20%. The data collected shows that something is not quite right... but that's about it.

What Does This Mean In Our Work?
Clearly something is not quite right. There are a number of possible reasons and this will be focus of my next few articles.

However, I want to say that even though the numbers don't match perfectly and sometimes they are way off, this does not negate the value of a time based analysis. Remember, we not trying to land a man on the moon. We try diagnosing performance to derive solutions that (usually) aim to reduce the database time. I suspect that in all four cases I show, we would not be misled.

But this does highlight the requirement to also analysis performance from a non-Oracle database centric perspective. I always look at the performance situation from an operating system perspective, an Oracle centric perspective and an application (think: SQL, processes, user experience, etc.) perspective. This "3 Circle" analysis will reduce the likelihood of making a tuning diagnosis mistake. So in case DB Time is completely messed up, by diagnosing performance from the other two "circles" you will know something is not right.

If you want to learn more about my "3-Circle" analysis, here are two resources:
  1. Paper. Total Performance Management. Do an OraPub search for "3 circle" and you'll find it.
  2. Online Seminar: Tuning Oracle Using An AWR Report. I go DEEP into an Oracle Time Based Analysis but keeping it day-to-day production system practical.
In my next few articles I will drill down into why there can be a "DB Time mismatch," what to do about it and how to use this knowledge to our advantage.

Enjoy your work! There is nothing quite like analyzing performance and tuning Oracle database systems!!


Categories: DBA Blogs

Happy Holidays and a Prosperous New Year from VitalSoftTech!

VitalSoftTech - Sun, 2014-12-21 09:01
We at VST want to thank you, our prestigious members, for making 2014 a memorable year for us!  We are ever so grateful for your continuous support, participation and words of encouragement.  As technological mavens, you help us sustain this community with quality feedback that derives continued success for us all! How about mastering a […]
Categories: DBA Blogs

Log Buffer #402, A Carnival of the Vanities for DBAs

Pakistan's First Oracle Blog - Sat, 2014-12-20 18:39
This Log Buffer edition hits the ball out of park by smashing yet another record of surfacing with a unique collection of blog posts from various database technologies. Enjoy!!!


EM12c and the Optimizer Statistics Console.
OBIEE and ODI on Hadoop : Next-Generation Initiatives To Improve Hive Performance.
Oracle Bundle Patching.
Performance Issues with the Sequence NEXTVAL Call.

SQL Server:

GUIDs GUIDs everywhere, but how is my data unique?
Questions About T-SQL Transaction Isolation Levels You Were Too Shy to Ask.
Introduction to Advanced Transact SQL Stairway and Using the CROSS JOIN Operator.
Introduction to Azure SQL Database Scalability.
What To Do When the Import and Export Wizard Fails.


Orchestrator 1.2.9 GA released.
Making HAProxy 1.5 replication lag aware in MySQL.
Monitor MySQL Performance Interactively With VividCortex.
InnoDB’s multi-versioning handling can be Achilles’ heel.
Memory summary tables in Performance Schema in MySQL 5.7.

Also published here.
Categories: DBA Blogs


Subscribe to Oracle FAQ aggregator - DBA Blogs