Skip navigation.

A Wider View: Experimentations and ruminations on Oracle performance analysis

Syndicate content
Oracle Database performance tuning and analysis experimentation and ruminations.Craig Shallahamer, President/Founder, OraPubhttp://www.blogger.com/profile/04109635337570098781noreply@blogger.comBlogger89125
Updated: 5 hours 1 min ago

Changing The Number Of Oracle Database 12c Log Writers

Mon, 2014-12-08 22:51
Changing The Number Of Oracle Database 12c Log Writers
In an Oracle Database 12c instance you will likely see multiple log writer (LGWR) background processes. When you first start the Oracle instance you will likely see a parent and two redo workers. This is a very big deal and something many of us have been waiting for - for many years!

While I'm excited about the change, if I can't control the number of LGWRs I could easily find myself once again constrained by the lack of LGWRs!

So, my question is how do I manipulate the number of LGWRs from the default. And what is the default based on? It's these types of questions that led me on this quest. I hope you enjoy the read!


Serialization Is Death
Multiple LGWRs is great news because serialization is death to computing performance. Think of it like this. A computer program is essentially lines of code and each line of code takes a little bit of time to execute. A CPU can only process N lines of code per second. This means every serial executing program has a maximum through capability. With a single log writer (LGWR) background process the amount of redo that can be processed is similarly constrained.

An Example Of Serialization Throughput Limitation
Suppose a CPU can process 1000 instructions per millisecond. Also, assume through some research a DBA determined it takes the LGWR 10 instructions to process 10 KB of redo. (I know DBAs who have taken the time to figure this stuff out.) Given these two pieces of data, how many KB of redo can the CPU theoretically process per second?

? KB of redo/sec = (1000 inst / 1 ms)*(10 KB redo / 10 instr)*(1000 ms / 1 sec)* (1 MB / 1000 KB) = 1000 KB redo/sec

This is a best case scenario. As you can see, any sequential process can become a bottleneck. One solution to this problem is to parallelize.

Note: Back in April of 2010 I posted a series of articles about parallelism. If you are interested in this topic, I highly recommend you READ THE POSTS.

Very Cool! Multiple 12c LGWRs... But Still A Limit?
Since serialization is death... and parallelism is life, I was really excited when I saw on my 12c Oracle instance by default it had two redo workers in addition to the "parent" log writer. On my Oracle version 12.0.1.0.2.0 Linux machine this is what I see:
$ ps -eaf|grep prod40 | grep ora_lg
oracle 54964 1 0 14:37 ? 00:00:00 ora_lgwr_prod40
oracle 54968 1 0 14:37 ? 00:00:00 ora_lg00_prod40
oracle 54972 1 0 14:37 ? 00:00:00 ora_lg01_prod40

This is important. While this is good news, unless Oracle or I have the ability to change and increase the number of LGWR redo workers, at some point the two redo workers, will become saturated bringing us back to the same serial LGWR process situation. So, I want and need some control.

Going Back To Only One LGWR
Interestingly, starting in Oracle Database version 12.0.1.0.2.0 there is an instance parameter _use_single_log_writer. I was able to REDUCE the number LGWRs to only one by setting the instance parameter _use_single_log_writer=TRUE. But that's the wrong direction I want to go!

More Redo Workers: "CPU" Instance Parameters
I tried a variety of CPU related instance parameters with no success. Always two redo workers.

More Redo Workers: Set Event...
Using my OSM script listeventcodes.sql I scanned the Oracle events (not wait events) but was unable to find any related Oracle events. Bummer...

More Redo Workers: More Physical CPUs Needed?
While talking to some DBAs about this, one of them mentioned they heard Oracle sets the number of 12c log writers is based on the number of physical CPUs. Not the number CPU cores but the number of physical CPUs. On a Solaris box with 2 physical CPUs (verified using the command, psrinfo -pv) upon startup there was still on two redo workers.

$ psrinfo -p
2
$ psrinfo -pv
The physical processor has 1 virtual processor (0)
UltraSPARC-III (portid 0 impl 0x14 ver 0x3e clock 900 MHz)
The physical processor has 1 virtual processor (1)
UltraSPARC-III (portid 1 impl 0x14 ver 0x3e clock 900 MHz)

More Redo Workers: Adaptive Behavior?
Looking closely at the Solaris LGWR trace file I repeatedly saw this:

Created 2 redo writer workers (2 groups of 1 each)
kcrfw_slave_adaptive_updatemode: scalable->single group0=375 all=384 delay=144 r
w=7940

*** 2014-12-08 11:33:39.201
Adaptive scalable LGWR disabling workers
kcrfw_slave_adaptive_updatemode: single->scalable redorate=562 switch=23

*** 2014-12-08 15:54:10.972
Adaptive scalable LGWR enabling workers
kcrfw_slave_adaptive_updatemode: scalable->single group0=1377 all=1408 delay=113
rw=6251

*** 2014-12-08 22:01:42.176
Adaptive scalable LGWR disabling workers

It looks to me like Oracle has programed in some sweeeeet logic to adapt the numbers of redo workers based the redo load.

So I created six Oracle sessions that simply inserted rows into a table and ran all six at the same time. But it made no difference in the number of redo workers. No increase or decrease or anything! I let this dml load run for around five minutes. Perhaps that wasn't long enough, the load was not what Oracle was looking for or something else. But the number of redo workers always remained at two.

Summary & Conclusions
It appears at instance startup the default number of Oracle Database 12c redo workers is two. It also appears that Oracle has either already built or is building the ability for Oracle to adapt to changing redo activity by enabling and disabling redo workers. Perhaps the number of physical CPUs (not CPU cores but physical CPUs) plays a part in this algorithm.

While this was not my research objective, I did discover a way to set the number of redo workers back to the traditional single LGWR background process.

While I enjoyed doing the research for this article, it was disappointing that I was unable to influence Oracle to increase the number of redo workers. I sure hope Oracle either gives me control or the adaptive behavior actually works. If not, two redo workers won't be enough for many Oracle systems.

All the best in your Oracle performance endeavors!

Craig.


Categories: DBA Blogs

The Perfect Gift For The Oracle DBA: Top 5 DBA T-Shirts

Mon, 2014-12-01 17:26
The Perfect Gift For The Oracle DBA: Top 5 DBA T-Shirts
It's that time of year again and I can already hear it, "Dad, what do you want for Christmas?" This year I'm taking action. Like forecasting Oracle performance, I'm taking proactive action.

Like most of you reading this, you have a, let's say, unique sense of humor. I stumbled across the ultimate geek website that has an astonishing variety of t-shirts aimed at those rare individuals like us that get a rush in understanding the meaning of an otherwise cryptic message on a t-shirt.

I picked my Top 5 DBA Geek T-Shirts based on the challenges, conflicts and joys of being an Oracle DBA. With each t-shirt I saw, a story came to mind almost immediately. I suspect you will have a similar experience that rings strangely true.

So here they are—the Top 5 T-Shirts For The Oracle DBA:
Number 5: Change Your Password
According to Slash Data the top password is now "Password".  I guess the upper-case "P" makes people feel secure, especially since last years top password was "123456" and EVERYBODY knows thats a stupid password. Thanks to new and improved password requirements, the next most popular password is "12345678". Scary but not surprising.

As Oracle Database Administrators and those who listened to Troy Ligon's presentation last years IOUG conference presentation, passwords are clearly not safe. ANY passwords. Hopefully in the coming years, passwords will be a thing of the past.


Number 4: Show Your Work
Part of my job as a teacher and consultant is to stop behavior like this: I ask a DBA, "I want to understand why you want to make this change to improve performance." And the reply is something like one of these:

  1. Because it has worked on our other systems.
  2. I did a Google search and an expert recommended this.
  3. Because the box is out of CPU power, there is latching issues, so increasing spin_count will help.
  4. Because we have got to do something and quick!

I teach Oracle DBAs to think from the user experience to the CPU cycles developing a chain of cause and effect. If we can understand the cause and effect relationships, perhaps we can disrupt poor performance and turn it to our favor. "Showing your work" and actually writing it down can be really helpful.

Number 3: You Read My T-Shirt
Why do managers and users think their presence in close proximity to mine will improve performance or perhaps increase my productivity? Is that what they learn in Hawaii during "end user training"?

What's worse is when a user or manager wants to talk about it...while I'm obviously in concentrating on a serious problem.

Perhaps if I wear this t-shirt, stand up, turn around and remain silent they will stop talking and get the point. We can only hope.

Number 2: I'm Here Because You Broke Something
Obnoxious but true. Why do users wonder why performance is "slow" when they do a blind query returning ten-million rows and then scroll down looking for the one row they are interested in.... Wow. The problem isn't always the technology... but you know that already.

Hint to Developers: Don't let users do a drop down or a lookup that returns millions or even thousands or even hundreds of rows... Please for the love of performance optimization!


Number 1 (drum roll): Stand Back! I'm Going To Try SCIENCE
One of my goals in optimizing Oracle Database performance is to be quantitative. And whenever possible, repeatable. Add some basic statistics and you've got science. But stand back because, as my family tells me, it does get a little strange sometimes.

But seriously, being a "Quantitative Oracle Performance Analyst" is always my goal because my work is quantifiable, reference-able and sets me up for advanced analysis.


So there you go! Five t-shirts for the serious and sometimes strange Oracle DBA. Not only will these t-shirts prove and reinforce your geeky reputation, but you'll get a small yet satisfying feeling your job is special...though a little strange at times.

All the best in your Oracle performance endeavors!

Craig.
Categories: DBA Blogs

Off May Not Be Totally Off: Is Oracle In-Memory Database 12c (12.1.0.2.0) Faster?

Mon, 2014-11-17 21:56
Off May Not Be Totally Off: Is Oracle In-Memory Database 12c (12.1.0.2.0) Faster?
Most Oracle 12c installations will NOT be using the awesome Oracle Database in-memory features available starting in version 12.1.0.2.0. This experiment is about the performance impact of upgrading to 12c but disabling the in-memory features.

Every experiment I have performed comparing buffer processing rates, clearly shows any version of 12c performs better than 11g. However, in my previous post, my experiment clearly showed a performance decrease after upgrading from 12.1.0.1.0 to 12.1.0.2.0.

This posting is about why this occurred and what to do about it. The bottom line is this: make sure "off" is "totally off."

Turn it totally off, not partially off
What I discovered is by default the in-memory column store feature is not "totally disabled." My experiment clearly indicates that unless the DBA takes action, not only could they be a license agreement violation but a partially disabled in-memory column store slightly slows logical IO processing compared to the 12c non in-memory column store option. Still, any 12c version processes buffer faster than 11g.

My experiment: specific and targeted
This is important: The results I published are based on a very specific and targeted test and not on a real production load. Do not use my results in making a "should I upgrade decision." That would be stupid and an inappropriate use of the my experimental results. But because I publish every aspect of my experiment and it is easily reproducible it is a valid data point with which to have a discussion and also highlight various situations that DBAs need to know about.

You can download all my experimental results HERE. This includes the raw sqlplus output, the data values, the free R statistics package commands, spreadsheet with data nicely formatted and lots of histograms.

The instance parameter settings and results
Let me explain this by first showing the instance parameters and then the experimental results. There are some good lessons to learn!

Pay close attention to the inmemory_force and inmemory_size instance parameters.

SQL> show parameter inmemory

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default string
inmemory_force string DEFAULT
inmemory_max_populate_servers integer 0
inmemory_query string ENABLE
inmemory_size big integer 0
inmemory_trickle_repopulate_servers_ integer 1
percent
optimizer_inmemory_aware boolean TRUE

SQL> show sga

Total System Global Area 7600078848 bytes
Fixed Size 3728544 bytes
Variable Size 1409289056 bytes
Database Buffers 6174015488 bytes
Redo Buffers 13045760 bytes

In my experiment using the above settings the median buffers processing rate was 549.4 LIO/ms. Looking at the inmemory_size and the SGA contents, I assumed the in-memory column store was disabled. If you look at the actual experimental result file "Full ds2-v12-1-0-2-ON.txt", which contain the explain plan of the SQL used in the experiment, there is no mention of the in-memory column store being used. My assumption, which I think is a fair one, was that the in-memory column store had been disabled.

As you'll see I was correct, but only partially correct.

The parameter settings below are when the in-memory column store was totally disabled. They key is changing the default inmemory_force parameter value from DEFAULT to OFF.

SQL> show parameter inmemory

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default string
inmemory_force string OFF
inmemory_max_populate_servers integer 0
inmemory_query string ENABLE
inmemory_size big integer 0
inmemory_trickle_repopulate_servers_ integer 1
percent
optimizer_inmemory_aware boolean TRUE
SQL> show sga

Total System Global Area 7600078848 bytes
Fixed Size 3728544 bytes
Variable Size 1291848544 bytes
Database Buffers 6291456000 bytes
Redo Buffers 13045760 bytes

Again, the SGA does not show any in-memory memory space. In my experiment with the above "totally off" settings, the median buffers processing rate was 573.5 LIO/ms compared to "partially off" 549.4 LIO/ms. Lesson: Make sure off is truly off.

It is an unfair comparison!
It is not fair to compare the "partially off" with the "totally off" test results. Now that I know the default inmemory_force must be changed to OFF, the real comparison should be made with the non in-memory column store version 12.1.0.1.0 and the "totally disabled" in-memory column store version 12.1.0.2.0. This is what I will summarize below. And don't forget all 12c versions showed a significant buffer processing increase compared to 11g.

The key question: Should I upgrade?
You may be thinking, if I'm NOT going to license and use the in-memory column store, should I upgrade to version 12.1.0.2.0? Below is a summary of my experimental results followed by the key points.

1. The non column store version 12.1.0.1.0 was able to process 1.1% more buffers/ms (median: 581.7 vs 573.5) compared to to "totally disabled" in-memory column store version 12.1.0.2.0. While this is statistically significant, a 1.1% buffer processing difference is probably not going to make-or-break your upgrade.

2. Oracle Corporation, I'm told, knows about this situation and is working on a fix. But even if they don't fix it, in my opinion my experimental "data point" would not warrant not upgrading to the in-memory column store version 12.1.0.2.0 even if you are NOT going to use the in-memory features.

3. Visually (see below) the non in-memory version 12.1.0.1.0 and the "totally off" in-memory version 12.1.0.2.0 samples sets look different. But they are pretty close. And as I mentioned above, statistically they are "different."

Note for the statistically curious: The red color 12.1.0.1.0 non in-memory version data set is highly variable. I don't like to see this in my experiments. Usually this occurs when a mixed workload sometimes impacts performance, I don't take enough samples or my sample time duration is too short. To counteract this, in this experiment I captured 31 samples. I also performed the experiment multiple times and the results where similar. What I could have done was used more application data to increase the sample duration time. Perhaps that would have made the data clearer. I could have also used another SQL statement and method to create the logical IO load.
What I learned from this experiment
To summarize this experiment, four things come to mind:

1. If you are not using an Oracle Database feature, completely disable it. My mistake was thinking the in-memory column store was disabled when I set it's memory size to zero and "confirmed" it was off by looking at the SGA contents.

2. All versions of 12c I have tested are clearly faster at processing buffers than any version of 11g.

3. There is a very slight performance decrease when upgrading from Oracle Database version 12.1.0.1.0 to 12.1.0.2.0.

4. It is amazing to me that with all the new features poured into each new Oracle Database version the developers have been able to keep the core buffer processing rate nearly at or below the previous version. That is an incredible accomplishment. While some people may view this posting as a negative hit against the Oracle Database, it is actually a confirmation about how awesome the product is.

All the best in your Oracle performance tuning work!

Craig.




Categories: DBA Blogs

Is Oracle Database 12c (12.1.0.2.0) Faster Than Previous Releases?

Tue, 2014-11-11 01:31
Is Oracle Database 12c (12.1.0.2.0) Faster Than Previous Releases?
I was wondering if the new Oracle Database 12c version 12.1.0.2.0 in-memory column store feature will SLOW performance when it is NOT being used. I think this is a fair question because most Oracle Database systems will NOT be using this feature.

While the new in-memory column store feature is awesome and significant, with each new Oracle feature there is additional kernel code. And if Oracle is not extremely careful, these new lines of Oracle kernel code can slow down the core of Oracle processing, that is, buffer processing in Oracle's buffer cache.

Look at it this way, if a new Oracle release requires 100 more lines of kernel code to be executed to process a single buffer, that will be reflected in how many buffers Oracle can process per second.

To put bluntly, this article is the result of my research comparing core buffer processing rates between Oracle Database versions 11.2.0.2.0, 12.1.0.1.0 and 12.1.0.2.0.

With postings like this, it is very important for everyone to understand the results I publish are based on a very specific and targeted test and not on a real production load. Do not use my results in making a "should I upgrade decision." That would be stupid and an inappropriate use of the my experimental results. But because I publish every aspect of my experiment and it is easily reproducable it is valid data point with which to have a discussion and also highlight various situations that DBAs need to know about.

There are two interesting results from this research project. This article is about the first discovery and my next article will focus on the second. The second is by far the most interesting!

FYI. Back in August of 2013 performed a similar experiment where I compared Oracle database versions 11.2.0.2.0 with 12.1.0.1.0. I posted the article HERE.

Why "Faster" Means More Buffer Gets Processed Per Second
For this experiment when I say "faster" I am referring to raw buffered block processing. When a buffer is touched in the buffer cache it is sometimes called a buffer get or a logical IO. But regardless of the name, every buffer get increases the instance statistic, session logical reads.

I like raw logical IO processing experiments because they are central to all Oracle Database processing. Plus with each new Oracle release, as additional functionality is inserted it is likely more lines of Oracle kernel code will exist. To maintain performance with added functionality is an incredible feat. It's more likely the core buffer processing will be slower because of the new features. Is this case with Oracle's in-memory column store?

How I Setup The Experiment
I have included all the detailed output, scripts, R commands and output, data plots and more in the Analysis Pack that can be downloaded HERE.

There are a lot of ways I could have run this experiment. But two key items must exist for a fare comparison. First, all the processing must be in cache. There can be no physical read activity. Second, the same SQL must be run during the experiment and have the same execution plan. This implies the Oracle 12c column store will NOT be used. A different execution plan is considered "cheating" as a bad plan will clearly loose. Again, this is a very targeted and specific experiment.

The experiment compares the buffer get rates for a given SQL statement. For each Oracle version, I gathered 33 samples and excluded the first two, just to ensure caching was not an issue. The SQL statement runs for around 10 seconds, processes around 10.2M rows and touches around 5M buffers. I checked to ensure the execution plans are the same for each Oracle version. (Again, all the details are in the Analysis Pack for your reading pleasure.)

I ran the experiment on a Dell server. Here are the details:
$ uname -a
Linux sixcore 2.6.39-400.17.1.el6uek.x86_64 #1 SMP Fri Feb 22 18:16:18 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
To make this easier for myself, to perform the test I used my CPU Speed Test tool (version 1i). I blogged about this last month HERE. The latest version of this tool can be downloaded HERE.

The Results, Statistically
Shown below are the experimental results. Remember, the statistic I'm measuring is buffer gets per millisecond.


Details about the above table: The "Normal" column is about if the statistical distribution of the 31 samples is normal. If the p-value (far right column) is greater than 0.05 then I'll say they are normal. In all three cases, the p-value is less than 0.05. If fact, if you look at the histograms contained in the Analysis Pack every histogram is visually clearly not normal. As you would expect the "Average" and the "Median" are the statistical mean and median. The "Max" is the largest value in the sample set. The "Std Dev" is the standard deviation, which is doesn't mean much since our sample sets are not normally distributed.

As I blogged about before the Oracle Database 12c buffer processing is faster than Oracle Database 11g. However, the interesting part is Oracle version with in-memory column store 12.1.0.2.0 is slower then the previous version of 12c, 12.1.0.1.0. In fact, in my experiment the in-memory column store version is around 5.5% slower! This means version 12.1.0.1.0 "out of the box" can process logical buffers around 5.5% faster! Interesting.

In case you're wondering, I used the default out-of-the-box in-memory column store settings for version 12.1.0.2.0. I checked the in-memory size parameter, inmemory_size and it was indeed set to zero. Also, when I startup the Oracle instance there is no mention of the in-memory column store.

Statistically Comparing Each Version
As an important side bar, I did statistically compare the Oracle Database versions. Why? Because while a 5.5% decrease in buffer throughput may seem important, it may not be statistically significant, meaning this difference can not be explained with our sample sets.

So going around saying version 12.1.0.2.0 is "slower" by 5.5% would be misleading. But in my experiment, it would NOT be misleading because the differences in buffer processing are statistically significant. The relevant experimental details are shown below.

Version A Version B Statistical p-value
Difference
---------- ---------- ----------- -------
11.2.0.1.0 12.1.0.1.0 YES 0.0000
11.2.0.1.0 12.1.0.2.0 YES 0.0000
12.1.0.1.0 12.1.0.2.0 YES 0.0000

In all three cases the p-value was less than 0.05 signifying the two sample sets are statistically
different. Again, all the details are in the Analysis Pack.

The chart above shows the histograms of both Oracle Database 12c version sample sets together. Visually they look very separated and different with no data crossover. So from both a numeric and visual perspective there is a real difference between 12.1.0.1.0 and 12.1.0.2.0.


What Does This Mean To Me
To me this is surprising. First, there is a clear buffer processing gain upgrading from Oracle 11g to 12c. That is awesome news! But I was not expecting a statistically significant 5.5% buffer processing decrease upgrading to the more recent 12.1.0.2.0 version. Second, this has caused me to do a little digging to perhaps understand the performance decrease. The results of my experimental journey are really interesting...I think more interesting than this posting! But I'll save the details for my next article.

Remember, if you have any questions or concerns about my experiment you can run the experiment yourself. Plus all the details of my experiment are included in the Analysis Pack.

All the best in your Oracle performance tuning work!

Craig.





Categories: DBA Blogs

Does Increasing An Oracle Background Process OS Priority Improve Performance?

Sat, 2014-11-01 21:00
Does Increasing An Oracle Background Process OS Priority Improve Performance?
Does increasing an Oracle Database background process operating system priority improve performance? As you might expect, the answer is, "It depends."

In this posting I will explain the results of an experiment where I increase the Oracle Database 12c log writer background processes operating system priority.

In my experiment I created a clear CPU bottleneck and the top wait event was log file parallel write. I gathered some data. Next I increased all the log writer background process priorities. Again, I gathered some data and then I analyzed the two data sets. The results were disappointing, not surprising, but a fundamental rule in performance tuning was demonstrated and reinforced.

You can download the "analysis pack" which contains the raw experimental data, histogram, statistical R results and the R statical package commands HERE. You can download total time delta reporting script (ttpctx.sql) I show below, which is contained within my OraPub System Monitor (OSM) Toolkit HERE.

Before I could gather some data, I needed to create the appropriate system load, the data gather scripts and the data analysis scripts. Here's a quick overview of each.

Increasing The LGWR Processes PriorityIf you are not familiar with changing Oracle Database background OS process priority, I blogged about how to do this HERE.

My experiment consisted of creating two situations and statistically comparing them to see if increasing the LGWR background process OS priority increased performance. The only difference in the "two situations" was the instance parameter, _high_priority_processes. For the "normal priority" part of the experiment, the default "LMS*|VKTM" was used. For the "high priority" part of the experiment the parameter was changed to "LMS*|VKTM|LG*". The "LG*" caused the increase in the Linux OS priority of all log writer processes from the default 19 to 41.

Ready for some version specifics? Oracle continues to make OS priority related instance parameter changes... even within 12c releases. Since this experiment was done with Oracle Database version 12.1.0.1.0 the parameter default was "LMS*|VKTM" not "LMS*" as with version 12.1.0.2.0. Also, in 12.1.0.2.0 VKTM is placed into a new parameter, _highest_priority_processes.

Generating The DML With CPU Bottleneck Load
To generate the DML workload, I used my OP Load Generator. You can download it HERE. It's quick, easy and I'm familiar with it. I kept increasing the number of DML processes until the CPU utilization was pegged at 100%. While the database server was clearly out of CPU power, the log writer background processes were rarely seen as the top CPU consuming process. I used the "top" program to monitor the process activity. The top CPU consuming processes were almost always the Oracle server/foreground/shadow processes. As I'll explain in the Results section below, this has significant ramifications on the results.

Oracle Time Based Analysis Summary
The data collection script was similar to the one I used back in 2012 when collecting data regarding Oracle's commit write facility. Essentially, I collected multiple three minute samples of the delta elapsed time, user commits, total non-idle wait time and CPU consumption. In both the normal and high priority runs, around 99% of the database time was CPU consumption and 1% of the time Oracle non-idle wait time. Also the top wait event (~70%) was log file parallel write (display name: log file redo write).

If you want master an Oracle Time Based Analysis, check out my online seminar, Tuning Oracle Using An AWR Report. It will teach you how to using an AWR report to optimize Oracle performance so users will feel the difference.

The OraPub System Monitor Toolkit script ttpctx.sql report below was taken during the "normal priority" log writer load.

SQL> @ttpctx

Database: prod35 16-MAY-14 06:07pm
Report: ttpctx.sql OSM by OraPub, Inc. Page 1
Total Time Activity (39 sec interval)

Avg Time Time Wait
Time Component % TT % WT Waited (ms) (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs 98.91 0.00 0.000 238.716 0
log file redo write 0.73 67.56 8.082 1.770 0
control file parallel write 0.18 16.41 33.077 0.430 0
target log write size 0.05 4.20 6.111 0.110 0
oracle thread bootstrap 0.03 3.05 40.000 0.080 0
os thread creation 0.02 1.53 20.000 0.040 0
commit: log file sync 0.01 0.76 10.000 0.020 0
enq: CR - block range reuse ckpt 0.01 0.76 20.000 0.020 0
Disk file operations I/O 0.00 0.00 0.000 0.000 0

The OSM report below was taken during the "high priority" log writer load.

SQL> @ttpctx

Database: prod35 16-MAY-14 09:25pm
Report: ttpctx.sql OSM by OraPub, Inc. Page 1
Total Time Activity (41 sec interval)

Avg Time Time Wait
Time Component % TT % WT Waited (ms) (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs 98.92 0.00 0.000 238.733 0
log file redo write 0.83 77.01 8.272 2.010 0
control file parallel write 0.08 7.28 14.615 0.190 0
target log write size 0.05 4.98 5.909 0.130 0
oracle thread bootstrap 0.03 3.07 40.000 0.080 0
os thread creation 0.02 1.92 25.000 0.050 0
commit: log file sync 0.01 0.77 10.000 0.020 0
enq: CR - block range reuse ckpt 0.01 0.77 20.000 0.020 0
enq: RO - fast object reuse 0.00 0.38 10.000 0.010 0


Data Collection
For the normal priority load 27 three minute samples where collected. For the high priority situation there were 30 three minute samples collected. (I forgot why there was only 27 samples collected for the normal priority.)  I collected the elapsed time, total non-idle wait time, total CPU consumption (v$sys_time_model: db_cpu + background cpu time) and total user commits.

In this experiment more user commits processed per second means better performance.


Experimental Results
I used the free statistics package "R" (www.r-project.org) to analyze the data. I demonstrate how to get, install and use "R" in my online video seminar, Using Skewed Data To Your Advantage.

With the normal log writer process priority, an average of 984.5 commits/sec and a median of 983.0 commits/sec occurred. With the LG* high process priority, an average of 993.6 commits/sec and a median of 991.0 commits/sec occurred. While the "high priority" situation was able to process more commits per second, is this statistically significant?

The red "smoothed" histogram is the normal priority situation and the blue smoothed histogram is when the log writers were set to the higher priority. The more separated the two histograms the more "different" the sample sets, the more likely there is a statistically significant difference and the more likely a user would feel the difference. Looking at the above histograms plot, there does not appear to be a real difference. But let's do a proper significance test!

Because both sample sets are normally distributed (details are in the Analysis Pack), I could use a simple t-test. R produced a p-value of 0.04451. To be statistically "different" I want the p-value to be less than 0.05 and it is. What does this mean?

While statistically and numerically the commit rates are different, I wouldn't expect any special performance tuning award! In fact, the hassles with cycling a production instance and setting underscore/hidden parameters would make it very unlikely I would increase the OS priority of the log writer background processes. I want to see a big performance difference.

To Summarize... What I Learned... Again
This situation is a perfect example of focusing on the wrong thing! While there is a clear operating system CPU bottleneck and the top wait event is about redo, the log writers are not suffering from a want/need of CPU resources. I suspect the server processes want more CPU resources, but they are NOT the processes we increased their OS priority.

If the log writers were suffering from a lack of CPU resources and fighting for CPU resources, I would expect to see them consuming CPU resources along with the Oracle server processes. And I would definitely expect to see them near the top of the "top" process monitor... especially when their priority has been increased!

Because of this "misguided" tuning effort, this experiment does not build a case for or against changing the log writer priority. What it reinforces is in our quest to optimize performance, make sure we focus on the right thing.

As a side note, this is a great statistical analysis example for two reasons. First, our samples sets look similar, but statistically they are not. Second, while they are statistically different, the performance impact will not be very different. And my guess is the users won't feel a thing... except frustration and anger.

To Super Summarize
When considering increasing a background process's operating system priority, make sure the process is in need of CPU and is not able to get it. In this situation, the DBA could have been mislead by the CPU bottleneck. But upon closer inspection of the log writers from an operating system perspective and knowing the wait event "log file parallel write" is probably more about IO than CPU (Oracle perspective) it would be unlikely that increasing the log writer processes OS priority would help increase the commits per second.

Thanks for reading!

Craig.












Categories: DBA Blogs

How To Change The Priority Of Oracle Background Processes

Mon, 2014-10-27 17:31
How To Change The Priority Of Oracle Background Processes
Before you get in a huf, it can be done! You can change an Oracle Database background process

priority through an instance parameter! I'm not saying it's a good idea, but it can be done.
In this post I explore how to make the change, just how far you can take it and when you may want to consider changing an Oracle background process priority.
To get your adrenaline going, check out the instance parameter _high_priority_processes from one of your production Oracle system with a version of 11 or greater. Here is an example using my OSM tool, ipx.sql on my Oracle Database version 12.1.0.2.0.
SQL> @ipx _highest_priority_processes
Database: prod40 27-OCT-14 02:22pm
Report: ipx.sql OSM by OraPub, Inc. Page 1
Display ALL Instance Parameters

Instance Parameter and Value Description Dflt?
-------------------------------------------------- -------------------- -----
_highest_priority_processes = VKTM Highest Priority TRUE
Process Name Mask
Then at the Linux prompt, I did:
$ ps -eo pid,class,pri,nice,time,args | grep prod40
2879 TS 19 0 00:00:00 ora_pmon_prod40
2881 TS 19 0 00:00:01 ora_psp0_prod40
2883 RR 41 - 00:02:31 ora_vktm_prod40
2889 TS 19 0 00:00:01 ora_mman_prod40
2903 TS 19 0 00:00:00 ora_lgwr_prod40
2905 TS 19 0 00:00:01 ora_ckpt_prod40
2907 TS 19 0 00:00:00 ora_lg00_prod40
2911 TS 19 0 00:00:00 ora_lg01_prod40
...
Notice the "pri" for priority of the ora_vktm_prod40 process? It is set to 41 while all the rest of the Oracle background processes are set to the default of 19. Very cool, eh?

Surprised By What I Found
Surprised? Yes, surprised because changing Oracle process priority is a pandoras box. Just imagine if an Oracle server (i.e., foreground) process has its priority lowered just a little and then attempts to acquire a latch or a mutex? If it doesn't get the latch quickly, I might never ever get it!

From a user experience perspective, sometimes performance really quick and other times the application just hangs.

This actually happened to a customer of mine years ago when the OS started reducing a process's priority after it consumed a certain amount of CPU. I learned that when it comes to Oracle processes, they are programed to expect an even process priority playing field. If you try to "game" the situation, do so at your own risk... not Oracle's.

Then why did Oracle Corporation allow background process priority to be changed. And why did Oracle Corporation actually change a background processes priority?!

Doing A Little Exploration
It turns out there are a number of "priority" related underscore instance parameters! On my 11.2.0.1.0 system there 6 "priority" parameters. On my 12.1.0.1.0 system there are 8 "priority" parameters. On my 12.1.0.2.0 system there are 13 "priority" parameters! So clearly Oracle is making changes! In all cases, the parameter I'm focusing on, "_high_priority_processes" exists.

In this posting, I'm going to focus on my Oracle Database 12c version 12.1.0.2.0 system. While you may see something different in your environment, the theme will be the same.

While I'll be blogging about all four of the below parameters, in this posting my focus will be on the _high_priority_processes parameter. Below are the defaults on my system:
_high_priority_processes        LMS*
_highest_priority_processes VKTM
_os_sched_high_priority 1
_os_sched_highest_priority 1

Messing With The LGWR Background Processes
I'm not testing this on a RAC system, so I don't have an LMS background process. When I saw the "LMS*" I immediately thought, "regular expression." Hmmm... I wonder if I can change the LGWR background process. So I made the instance parameter change and recycled the instance. Below shows the instance parameter change:
SQL> @ipx _high_priority_processes
Database: prod40 27-OCT-14 02:36pm
Report: ipx.sql OSM by OraPub, Inc. Page 1
Display ALL Instance Parameters

Instance Parameter and Value Description Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes = LMS*|LGWR High Priority FALSE
Process Name Mask

Below is an operating system perspective using the ps command:

ps -eo pid,class,pri,nice,time,args | grep prod40
...
5521 RR 41 - 00:00:00 ora_vktm_prod40
5539 TS 19 0 00:00:00 ora_dbw0_prod40
5541 RR 41 - 00:00:00 ora_lgwr_prod40
5545 TS 19 0 00:00:00 ora_ckpt_prod40
5547 TS 19 0 00:00:00 ora_lg00_prod40
5551 TS 19 0 00:00:00 ora_lg01_prod40
...

How Far Can I Take This?
At this point in my journey, my mind was a blaze! The log file sync wait event can be really difficult to deal with and especially so when there is a CPU bottleneck. Hmmm... Perhaps I can increase the priority of all the log writer background processes?

So I made the instance parameter change and recycled the instance. Below shows the instance parameter change:
SQL> @ipx _high_priority_processes
Database: prod40 27-OCT-14 02:44pm
Report: ipx.sql OSM by OraPub, Inc. Page 1
Display ALL Instance Parameters

Instance Parameter and Value Description Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes = LMS*|LG* High Priority FALSE
Process Name Mask

Below is an operating system perspective using the ps command:

ps -eo pid,class,pri,nice,time,args | grep prod40
...
5974 TS 19 0 00:00:00 ora_psp0_prod40
5976 RR 41 - 00:00:00 ora_vktm_prod40
5994 TS 19 0 00:00:00 ora_dbw0_prod40
5996 RR 41 - 00:00:00 ora_lgwr_prod40
6000 TS 19 0 00:00:00 ora_ckpt_prod40
6002 RR 41 - 00:00:00 ora_lg00_prod40
6008 RR 41 - 00:00:00 ora_lg01_prod40
6014 TS 19 0 00:00:00 ora_lreg_prod40
...

So now all the log writer background processes have a high priority. My hope would be that if there is an OS CPU bottleneck and the log writer background processes wanted more CPU, I now have the power to give that to them! Another tool in my performance tuning arsenal!

Security Hole?
At this point, my exuberance began to turn into paranoia. I thought, "Perhaps I can increase the priority of an Oracle server process or perhaps any process." If so, that would be a major Oracle Database security hole.

With fingers trembling, I changed the instance parameters to match an Oracle server process and recycled the instance. Below shows the instance parameter change:

SQL> @ipx _high_priority_processes
Database: prod40 27-OCT-14 02:52pm
Report: ipx.sql OSM by OraPub, Inc. Page 1
Display ALL Instance Parameters

Instance Parameter and Value Description Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes = High Priority FALSE
LMS*|LG*|oracleprod40 Process Name Mask

Below is an operating system perspective using the ps command:

$ ps -eo pid,class,pri,nice,time,args | grep prod40
...
6360 TS 19 0 00:00:00 ora_psp0_prod40
6362 RR 41 - 00:00:00 ora_vktm_prod40
6366 TS 19 0 00:00:00 ora_gen0_prod40
6382 RR 41 - 00:00:00 ora_lgwr_prod40
6386 TS 19 0 00:00:00 ora_ckpt_prod40
6388 RR 41 - 00:00:00 ora_lg00_prod40
6394 RR 41 - 00:00:00 ora_lg01_prod40
6398 TS 19 0 00:00:00 ora_reco_prod40
...
6644 TS 19 0 00:00:00 oracleprod40...
...

OK, that didn't work so how about this?

SQL> @ipx _high_priority_processes
Database: prod40 27-OCT-14 02:55pm
Report: ipx.sql OSM by OraPub, Inc. Page 1
Display ALL Instance Parameters

Instance Parameter and Value Description Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes = High Priority FALSE
LMS*|LG*|*oracle* Process Name Mask

Let's see what happened at the OS.

$ ps -eo pid,class,pri,nice,time,args | grep prod40
...
6701 RR 41 - 00:00:00 ora_vktm_prod40
6705 RR 41 - 00:00:00 ora_gen0_prod40
6709 RR 41 - 00:00:00 ora_mman_prod40
6717 RR 41 - 00:00:00 ora_diag_prod40
6721 RR 41 - 00:00:00 ora_dbrm_prod40
6725 RR 41 - 00:00:00 ora_vkrm_prod40
6729 RR 41 - 00:00:00 ora_dia0_prod40
6733 RR 41 - 00:00:00 ora_dbw0_prod40
...
6927 RR 41 - 00:00:00 ora_p00m_prod40
6931 RR 41 - 00:00:00 ora_p00n_prod40
7122 TS 19 0 00:00:00 oracleprod40 ...
7124 RR 41 - 00:00:00 ora_qm02_prod40
7128 RR 41 - 00:00:00 ora_qm03_prod40

Oh Oh... That's not good! Now EVERY Oracle background process has a higher priority and my Oracle server process does not.

So my "*" wildcard caused all the Oracle processes to be included. If all the processes a high prioirty, then the log writer processes have no advantage over the others. And to make matters even worse, my goal of increasing the server process priority did not occur.

However, this is actually very good news because it appears this is not an Oracle Database security hole! To me, it looks like the priority parameter is applied during the instance startup for just the background processes. Since my server process was started after the instance was started and for sure not included in the list of background processes, its priority was not affected. Good news for security, not as good of news for a performance optimizing fanatic such as myself.

Should I Ever Increase A Background Process Priority?
Now that we know how to increase an Oracle Database background process priority, when would we ever want to do this? The short answer is probably never. But the long answer is the classic, "it depends."

Let me give you an example. Suppose there is an OS CPU bottleneck and the log writer background processes are consuming lots of CPU while handling all the associated memory management when server process issues a commit. In this situation, performance may benefit by making it easier for the log writer processes to get CPU cycles, therefore improving performance. But don't even think about doing this unless there is a CPU bottleneck. And even then, be very very careful.

In my next block posting, I'll detail an experiment where I changed the log writer background processes priority.

Thanks for reading!

Craig.



Categories: DBA Blogs

11 Tips To Get Your Conference Abstract Accepted

Wed, 2014-10-08 19:21
11 Ways To Get Your Conference Abstract Accepted
This is what happens when your abstract is selected!Ready for some fun!? It's that time of year again and the competition will be intense. The "call for abstracts" for a number of Oracle Database conferences are about to close.

The focus of this posting is how you can get a conference abstract accepted.

As a mentor, Track Manager and active conference speaker I've been helping DBAs get their abstracts accepted for many years. If you follow my 11 tips below, I'm willing to bet you will get a free pass to any conference you wish in any part of the world.

1. No Surprises! 
Track Manager After A SurpriseThe Track Manager wants no surprises, great content and a great presentation. Believe me when I say, they are looking for ways to reduce the risk of a botched presentation, a cancelation or a no show. Your abstract submissions is your first way to show you are serious and will help make the track incredibly awesome.

Tip: In all your conference communications, demonstrate a commitment to follow through.

2. Creative Title.
The first thing everyone sees is the title. I can personally tell you, if the title does not peak my curiosity without sounding stupid, then unless I know the speaker is popular I will not read the abstract. Why do I do this? Because as a Track Manager, I know conference attendees will do the same thing! And as a Track Manager, I want attendees to want to attend sessions in my track.

Tip: Find two people, read the title to them and ask what they think. If they say something like, "What are you going to talk about?" that's bad. Rework the title.

3. Tell A Story
The abstract must tell a compelling story. Oracle conferences are not academic conferences! There needs to be some problem along with a solution complete with drama woven into the story.

Tip: People forget bullet points, but they never forget a good story.

4. Easy To Read
The abstract must be easy to review. The abstract reviewers may have over a hundred abstracts to review. Make it a good quick read for the reviewers and your chances increase.

Tip: Have your computer read your abstract back to you. If you don't say, "Wow!" rework the abstract. 

5. Be A Grown-Up
You can increase the perception you will physically show up and put on a great show at the conference by NOT putting into your abstract emoji, bullet points, your name and title or pushing a product or service. NEVER copy/paste from a powerpoint outline into the abstract or outline. (I've seen people do this!)

Tip: Track Managers do not want to baby sit you. They want an adult who will help make their track great.

6. Submit Introductory Level Abstracts
I finally figured this out a couple years ago. Not everyone is ready for a detailed understanding of cache buffer chain architecture, diagnosis, and solution development. Think of it from a business perspective. Your market (audience) will be larger if your presentation is less technical. If this bothers you, read my next point.

Tip: Submit both an introductory level version and advanced level version of your topic.

7. Topics Must Be Filled
Not even the Track Manager knows what people will submit. And you do not know what the Track Manager is looking for. And you do not know what other people are submitting. Mash this together and it means you must submit more than one abstract. I know you really, really want to present on topic X. But would you rather not have an abstract accepted?

Tip: Submit abstracts on multiple topics. It increases your chances of being accepted.

8. Submit Abstract To Multiple Tracks
This is similar to submitting both an introductory version of your abstract. Here's an example: If there is a DBA Bootcamp track and a Performance & Internals Track, craft your abstract to Bootcamp version has a more foundational/core feel to it. And craft your Performance & Internals version to feel more technical and advanced.

Do not simply change the title and the abstract can not be the same.  If the conference managers or the Track Manager feels you are trying to game the conference, you present a risk to the conference and their track and your abstracts will be rejected. So be careful and thoughtful.

Tip: Look for ways to adjust your topic to fit into multiple tracks.

9. Great Outline Shows Commitment
If the reviewers have read your title and abstract, they are taking your abstract seriously. Now is the time to close the deal by demonstrating you will put on a great show. And this means you already have in mind an organized and well thought out delivery. You convey this with a fantastic outline. I know it is difficult to create an outline BUT the reviewers also know this AND having a solid outline demonstrates to them you are serious, you will show up, and put on a great show.

Tip: Develop your abstract and outline together. This strengthens both and develops a kind of package the reviewers like to see.

10. Learning Objectives Show Value
You show the obvious value of your topic through the learning objectives. Personally, I use these to help keep me focused on my listener, just not what I'm interested in at the moment. Because I love my work, I tend to think everyone also does... not so. I must force myself to answer the question, "Why would a DBA care about this topic?"

Tip: Develop your learning objectives by asking yourself, "When my presentation is over, what do I want the attendees to remember?"

11. Submit About Problems You Solved
Submit on the topics you have personally explored and found fascinating. Every year, every DBA has had to drill deep into at least one problem. This concentrated effort means you know the topic very well. And this means you are qualified to tell others about it! People love to hear from people who are fascinated about something. Spread the good news resulting from a "bad" experience.

Tip: Submit on topics you have explored and are fascinated with.

How Many Abstracts Should I Submit?
It depends on the conference, but for a big North America conference like ODTUG, RMOUG and IOUG I suggest at least four.

Based on what I wrote above, pick three topics, perhaps create both an introductory and advanced version and look to see if it makes sense to submit to multiple tracks. That means you'll probably submit at least four abstracts. It's not as bad as it sounds, because you will only have perhaps three core abstracts. All the others are modifications to fit a specific need. Believe when you receive the acceptance email, it will all be worth it!

See you at the conference!

Craig.

Categories: DBA Blogs

Comparing SQL Execution Times From Different Systems

Mon, 2014-10-06 18:17
Comparing SQL Execution Times From Different Systems
Suppose it's your job to identify SQL that may run slower in the about-to-be-upgrated Oracle Database. It's tricky because no two systems are alike. Just because the SQL run time is faster in the test environment doesn't mean the decision to upgrade is a good one. In fact, it could be disastrous.

For example; If a SQL statement runs 10 seconds in production and runs 20 seconds in QAT, but the production system is twice as fast as QAT, is that a problem? It's difficult to compare SQL runs times when the same SQL resides in different environments.

In this posting, I present a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.

And, there is a cool, free, downloadable tool involved!

Why SQL Can Run Slower In Different Environments
There are a number of reasons why a SQL's run time is different in different systems. An obvious reason is a different execution plan. A less obvious and much more complex reason is a workload intensity or type difference. In this posting, I will focus on CPU speed differences. Actually, what I'll show you is how to remove the CPU speed differences so you can appropriately compare two SQL statements. It's pretty cool.

The Mental Gymnastics
If a SQL statement's elapsed time in production is 10 seconds and 20 seconds in QAT, that’s NOT an issue IF the production system is twice as fast.

If this makes sense to you, then what you did was mentally adjust one of the systems so it could be appropriately compared. This is how I did it:

10 seconds in production * production is 2 times as fast as QA  = 20 seconds 
And in QA the sql ran in 20 seconds… so really they ran “the same” in both environments. If I am considering placing the SQL from the test environment into the production environment, then this scenario does not raise any risk flags. The "trick" is determining "production is 2 times as fast as QA" and then creatively use that information.
Determining The "Speed Value"
Fortunately, there are many ways to determine a system's "speed value." Basing the speed value on Oracle's ability to process buffers in memory has many advantages: a real load is not required or even desired, real Oracle code is being run at a particular version, real operating systems are being run and the processing of an Oracle buffer highly correlates with CPU consumption.
Keep in mind, this type of CPU speed test is not an indicator of scalability (benefit of adding additional CPUs) in any way shape or form. It is simply a measure of brut force Oracle buffer cache logical IO processing speed based on a number of factors. If you are architecting a system, other tests will be required.
As you might expect, I have a free tool you can download to determine the "true speed" rating. I recently updated it to be more accurate, require less Oracle privileges, and also show the execution plan of the speed test tool SQL. (A special thanks to Steve for the execution plan enhancement!) If the execution plan used in the speed tool is difference on the various systems, then obviously we can't expect the "true speeds" to be comparable.
You can download the tool HERE.
How To Analyze The Risk
Before we can analyze the risk, we need the "speed value" for both systems. Suppose a faster system means its speed rating is larger. If the production system speed rating is 600 and the QAT system speed rating is 300, then production is deemed "twice as fast."
Now let's put this all together and quickly go through three examples.
This is the core math:
standardized elapsed time = sql elapsed time * system speed value
So if the SQL elapsed time is 25 seconds and the system speed value is 200, then the standardized "apples-to-apples" elapsed time is 5000 which is 25*200. The "standardized elapsed time" is simply a way to compare SQL elapsed times, not what users will feel and not the true SQL elapsed time.
To make this a little more interesting, I'll quickly go through three scenarios focusing on identifying risk.
1. The SQL truly runs the same in both systems.
Here is the math:
QAT standardized elapsed time = 20 seconds X 300 = 6000 seconds
PRD standardized elapsed time = 10 seconds X 600 = 6000 seconds
In this scenario, the true speed situation is, QAT = PRD. This means, the SQL effectively runs just as fast in QAT as in production. If someone says the SQL is running slower in QAT and therefore this presents a risk to the upgrade, you can confidently say it's because the PRD system is twice as fast! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.
2. The SQL runs faster in production.
Now suppose the SQL runs for 30 seconds in QAT and for 10 seconds in PRD. If someone was to say, "Well of course it's runs slower in QAT because QAT is slower than the PRD system." Really? Everything is OK? Again, to make a fare comparison, we must compare the system using a standardizing metric, which I have been calling the, "standardized elapsed time."
Here are the scenario numbers:
QAT standardized elapsed time = 30 seconds X 300 = 9000 secondsPRD standardized elapsed time = 10 seconds X 600 = 6000 seconds
In this scenario, the QAT standard elapsed time is greater than the PRD standardized elapsed time. This means the QAT SQL is truly running slower in QAT compared to PRD. Specifically, this means the slower SQL in QAT can not be fully explained by the slower QAT system. Said another way, while we expect the SQL in QAT to run slower then in the PRD system, we didn't expect it to be quite so slow in QAT. There must another reason for this slowness, which we are not accounting for. In this scenario, the QAT SQL should be flagged as presenting a significant risk when upgrading from QAT to PRD.
3. The SQL runs faster in QAT.
In this final scenario, the SQL runs for 15 seconds in QAT and for 10 seconds in PRD. Suppose someone was to say, "Well of course the SQL runs slower in QAT. So everything is OK." Really? Everything is OK? To get a better understanding of the true situation, we need to look at their standardized elapsed times.
QAT standardized elapsed time = 15 seconds X 300 = 4500 secondsPRD standardized elapsed time = 10 seconds X 600 = 6000 seconds 
In this scenario, QAT standard elapsed time is less then the PRD standardized elapsed time. This means the QAT SQL is actually running faster in the QAT, even though the QAT wall time is 15 seconds and the PRD wall time is only 10 seconds. So while most people would flag this QAT SQL as "high risk" we know better! We know the QAT SQL is actually running faster in QAT than in production! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.
In Summary...
Identify risk is extremely important while planning for an upgrade. It is unlikely the QAT and production system will be identical in every way. This mismatch makes identifying risk more difficult. One of the common differences in systems is their CPU processing speeds. What I demonstrated was a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.
What's Next?
Looking at the "standardized elapsed time" based on Oracle LIO processing is important, but it's just one reason why a SQL may have a different elapsed time in a different environment. One of the big "gotchas" in load testing is comparing production performance to a QAT environment with a different workload. Creating an equivalent workload on different systems is extremely difficult to do. But with some very cool math and a clear understanding of performance analysis, we can also create a more "apples-to-apples" comparison, just like we have done with CPU speeds. But I'll save that for another posting.

All the best in your Oracle performance work!

Craig.




Categories: DBA Blogs

Top 7 Reasons Why Oracle Conferences Rock!

Wed, 2014-09-24 10:56
Top 7 Reasons Why Oracle Conferences Rock!
Why take the time and make the effort to attend an Oracle database conference or Oracle user group meeting? We're all busy, so there had better be some super good reasons to make the effort! For me, the benefits definitely exceed the cost.

There are many different conferences to choose from. There are professional conferences, leadership conferences, scientific conferences and business focused conferences. So why an Oracle Database conference? What's the big deal?

This is difficult for me to explain, so I've summarized why I love Oracle conferences into a "Top 7" list.

Number 7. Get free stuff from vendorsI suspect 50% of Oracle DBAs attend conferences because they love getting free stuff. If you want to maximize the good stuff, timing is everything. Get to the booth when the exhibition hall first opens, because vendors have tons of stuff to give away and they are full of energy. Also, just before the exhibition hall closes on the final day show up again. Vendors are tired and want to get rid of as much stuff as possible... and you just happen to be there!

Number 6. Become known in your industryIf you want to focus and excel in your career in a particular area, then pass on what you know. If you hold on to what you learn, no one will know you're an expert but you... and that's lonely. One of the best ways to become known and enjoy conferences is to speak at them. If this is something you want to do, please email me. I'll mentor you. I'm serious.

Number 5. Talk to others, if you want toWant to talk with people who care about Oracle technology as much as you do? Conferences are a great way to break out of your rut and think... and if you want to talk. One of the reasons I like conferences is it gives me a chance to meet with past students.

Number 4. Learn, if you want toThere are always opportunities to learn at conferences. How many times have I heard someone say they were in a lame session. But when I ask if they read the abstract first, they say no. If you want to increase your chances of attending great sessions, read the abstract. Second, look for speakers you like. Third, sit next to the door just in case you need to make a quick exit! The worst you can do is look at the agenda and pick the sessions you want to attend. Do a little research and you'll be surprised how many good sessions there are.


Number 3. Be inspired!How many places can you go to receive inspiration? Not many, is my answer! Because I can chill for a bit at conferences, observe what's going on in our industry and interact with a wide variety of people associated with Oracle technology, I tend to leave with a fresher and more refined view... or better said, "A Wider View." That in itself is worth the cost of any conference.

Number 2. Network with vendors, speakers, colleaguesI take it one step at a time. I always cruise the exhibition halls looking for new products and touching base with the vendors I know. Personally, I like to encourage the new vendors because it is a massive investment for them to exhibit. When I'm cruising, that's where I typically reconnect with students and other DBAs. It's a more comfortable and relaxing environment for me. If you want to speak with an expert face-to-face, a conference is a natural place to do this. If someone walks up to me with an AWR report, I'll make time for them. If they email me the week before, I'll do whatever I can to schedule some time with them.

Number 1. Be with friendsThis one is personal. When I'm teaching or consulting, it's very intense and compressed. But at conferences, I get a chance to unwind, sit back, have a beer, and talk. I'm really involved with the IOUG conference and each year I look forward to reconnecting with this small group of friends. It's such a good feeling to be with "Oracle friends" without having to talk... about Oracle. I can just sit and listen without anyone wondering... What's wrong with Craig? You can do the same thing. Find an Oracle user group and volunteer.

Why Attend? Because Oracle conferences rock!There you have it. Seven reasons why Oracle conferences and Oracle User Groups rock! If you're not planning to attend any, you're missing a great way to connect, advance your career, and have a lot of fun. See you there!!

All the best in your Oracle performance work!

Craig.




Categories: DBA Blogs

Top 7 Reasons Why Oracle Conferences Are A Waste Of Time

Mon, 2014-09-15 16:38
Top 7 Reasons Why Oracle Conferences Are A Waste Of Time
Want to turn a lame Oracle Database conference experience into a great one? You may not believe this, but I think Oracle conferences are a waste of time. That is, unless I take action. I've been to hundreds of Oracle conferences, so I'm kind of an expert in this field.

Here is my "Top 7" list about why Oracle conferences are time suckers AND how you can turn them into a GREAT educational, social and networking experience.

Number 7. Sleeper Presentations. You forgot to read that session abstract before you sat down? You're not alone! Here my secret: Sit by the door. But when you walk out, remember that the speaker probably
knows how you feel but is doing they best they can. Out of respect for them and the poor souls who are actually enjoying the session, be quiet when you leave.

Number 6. My System Is Bigger Than Your System. How many times have we all heard someone talking about their, "50 node RAC global system with 5 million active users." Really? Is that even possible? Here are four proven options. Option 1 is to ask a question to shut them up. For example, "So how do you deal with the enqueue contention?" Another option is to simply walk away. If you feel that's rude, then Option 3 is to suddenly grab your phone to answer "that" call and walk away...never to return. If you're feeling feisty then combining Option 2 and 3 is wonderful. Ask the question "So how do you deal with upgrading 50 nodes?" AND THEN suddenly grab your phone and walk away.

Number 5. Not Another New Feature Presentation! Oracle Corporation and their marketing minions love to talk about futures. If you're depressed from hearing about new features you won't get to touch for the next five years, here's what to do. First, remember that Oracle Corporation user group sponsorship is important to user group conferences. Without the help from Oracle, most conferences would not occur. Second, the more Oracle sponsors an event, the more they influence the content. The money source always drives the content. The solution is to look at number of presentations given by non-Oracle employees versus Oracle employees... before you register.

Number 4. Can't Afford It Unless Oracle Sales Pays. Yes you can! I know training and traveling budgets have been cut. What you need is a free pass, right? I figured this one out while still working for Oracle. Submit an abstract to speak. If speaking freaks you out, click HERE. If you won't speak, then volunteer. Believe me when I say, EVERY CONFERENCE NEEDS VOLUNTEERS.

Number 3. Boring Conference. I totally get that. Want to know the secret about boring conferences? Avoid them...unless you take action. My conference of choice is Collaborate/IOUG. I love that conference... because I'm really involved, have been for years and look forward to working with the conference team and seeing them each year. It's kind of like family to me.

Number 2. I Feel Like A Dork. Because I speak a lot, people don't realize that much of the time I feel like a social idiot. Strange, but put me on stage or in a mentoring situation and I'm fine. It's taken me 20 years of conferences to figure this one out. Here's what to do: Go with a colleague or volunteer so you'll always have someone to hang out with. If that doesn't work and you're a guy, look for the stupid games because that's where you'll find DBAs just like you (and me).

Number 1. Stupid Drunks. I get so tired of drunk people yelling in my face! Especially when I can feel the spit from their mouths poke me in the eye. It only took a few years to figure this one out. The solution? Step back a couple of feet. And if that doesn't work, then walk away. Don't worry about offending them, because they won't remember you anyways.

Make It A Great Conference!

There you have it, my "Top 7" list about why Oracle conferences are time suckers AND how you can turn them into a GREAT educational, social and networking experience.

See you at IOUG this April in Las Vegas!

All the best in your Oracle performance work,

Craig.
Categories: DBA Blogs

Watch Oracle DB Elapsed Time and Wall Time With Parallel Query

Sun, 2014-09-07 09:09
Watch Oracle Elapsed Time and Wall Time With Parallel Query
In my recent postings I wrote that when using the Oracle Database parallel query a SQL statement's wall time should be equal to its elapsed time divided by the number of parallel query slaves plus some overhead.

That may seem correct, but is it really true? To check I ran an experiment and posted the results here. The results are both obvious and illuminating.

If you don't want to read but just sit on the couch, have a beer and watch TV you're in luck! I took a clip from my Tuning Oracle Using An AWR Report online video seminar put it on youtube.  You can watch the video clip on YouTube HERE or simply click on the movie below.


The Math, For Review Purposes
In my previous recent postings I detailed the key time parameters; DB Time, DB CPU, non-idle wait time, elapsed time, parallelism and effective parallelism. To save you some clicking, the key parameters and their relationships are shown below.

DB Time = DB CPU + NIWT

Elapsed Time = Sum of DB Time

Wall Time = ( Elapsed Time / Parallelism ) + Parallelism Overhead

Wall Time = Elapsed Time / Effective Parallelism


Test Results: When Oracle Parallel Query was NOT involved.
If you want to see my notes, snippets, etc. they can be found in this text file HERE.

Here is the non-parallel SQL statement.

select /*+ FULL(big2) NOPARALLEL (big2) */ count(*)
into   i_var
from   big2 
where  rownum < 9000000

When the SQL statement was running, I was monitoring the session using my Realtime Session Sampler OSM tool, rss.sql. Since I knew the server process session ID and wanted to sample every second and wanted to see everything just for this session, this is the rss.sql syntax:
SQL>@rss.sql 16 16 827 827 % % 1
For details on any OSM tool syntax, run the OSM menu script, osmi.sql. You can download my OSM Toolkit HERE.

The rss.sql tool output is written to a text file, which I was doing a "tail -f" on. Here is a very small snippet of the output. The columns are sample number, sample time, session SID, session serial#, Oracle username, CPU or WAIT, SQL_ID, OraPub wait category, wait event, [p1,p2,p3].


We can see the session is consuming CPU and waiting. When waiting, the wait event is "direct path read", which is asynchronous (we hope) block read requests to the IO subsystem that will NOT be buffered in the Oracle buffer cache.

Now for the timing results, which are shown in the below table. I took five samples.  It's VERY important to know that the wait time (WAIT_TIME_S), DB CPU (DB_CPU_S), and DB Time (DB_TIME_S) values are related to ONLY server process SID 16. In blazing contrast, the wall time (WALL_S), elapsed time (EL_VSQL_S), and SQL statement CPU consumption (CPU_VSQL_S) is related the entire SQL_ID statement execution.

Here are the "no parallel" experimental results.
SQL> select * from op_results;

SAMPLE_NO WALL_S EL_VSQL_S CPU_VSQL_S WAIT_TIME_S DB_CPU_S DB_TIME_S
---------- ---------- ---------- ---------- ----------- ---------- ----------
1 35.480252 35.470015 9.764407 24.97 9.428506 34.152294
2 35.670021 35.659748 9.778554 25.15 9.774984 35.541861
3 35.749926 35.739473 9.774375 25.12 9.31266 34.126285
4 35.868076 35.857752 9.772321 25.32 9.345398 34.273479
5 36.193062 36.18378 9.712962 25.46 9.548465 35.499693
Let's check the math. For simplicity and clarity, please allow me to round and use only sample 5.
DB_TIME_S = DB_CPU_S + WAIT_TIME_S
35.5 = 9.5 + 25.5 = 35.0
The DB Time is pretty close (35.5 vs 35.0). Close enough to demonstrate the time statistic relationships.
Elapsed Time (EL_VSQL_S) = DB_TIME_S
35.5 = 34.2
The Elapsed Time is off by around 4% (35.5 vs 34.2), but still closely to demonstrate the time statistic relationships.
Wall Time (WALL_S) = Elapsed Time (EL_VSQL_S) / Effective Parallelism
35.5 = 35.5 / 1
Nice! The Wall Time results matched perfectly. (35.5 vs 35.5)

To summarize in a non parallel query (i.e., single server process) situation, the time math results are what we expected! (and hoped for)

Test Results: When Oracle Parallel Query WAS involved.
The only difference in the "non parallel" SQL statement above and the SQL statement below is the parallel hint. Below is the "parallel" SQL statement.
select /*+  FULL(big2) PARALLEL(big2,3)  */ count(*) into i_var from big2 where rownum<9000000>
When the "parallel" SQL statement was running, because Oracle parallel query was involved resulting in multiple related Oracle sessions, when monitoring using my rss.sql tool, I need to open the session ID (and serial#) to include all sessions. I still sampled every second. Here is the rss.sql syntax:
SQL>@rss.sql 0 9999 0 9999 % % 1
The tool output is written to a text file, which I was doing a "tail -f" on. Here is a very small snippet of the output. I manually inserted the blank lines to make it easier to see the different sample periods.


There is only one SQL statement being run on this idle test system. And because there is no DML involved, we don't see much background process activity. If you look closely above, sessions 168 (see third column) must be a log write process because the wait event is "log file parallel write". I checked and session 6 is a background process as well.

It's no surprise to typically see only four session involved. One session is the parallel query coordinator and the three parallel query slaves! Interestingly, the main server process session that I executed the query from is session number 16. It never showed up in any of my samples! I suspect it was "waiting" on an idle wait event and I'm only showing processes consuming CPU or waiting on a non-idle wait event. Very cool.

Now for the timing results. I took five samples.  Again, it's VERY important to know that the wait time (WAIT_TIME_S), DB CPU (DB_CPU_S), and DB Time (DB_TIME_S) values are related to ONLY calling server process, which in this case is session 16. In blazing contrast, the wall time (WALL_S), elapsed time (EL_VSQL_S), and SQL statement CPU consumption (CPU_VSQL_S) is related the entire SQL statement execution.

Here are the "parallel" experimental results.
 SQL>  select * from op_results;

SAMPLE_NO WALL_S EL_VSQL_S CPU_VSQL_S WAIT_TIME_S DB_CPU_S DB_TIME_S
---------- ---------- ---------- ---------- ----------- ---------- ----------
1 46.305951 132.174453 19.53818 .01 4.069579 4.664083
2 46.982111 132.797536 19.371063 .02 3.809439 4.959602
3 47.79761 134.338069 19.739735 .02 4.170921 4.555491
4 45.97324 131.809249 19.397557 .01 3.790226 4.159572
5 46.053922 131.765983 19.754143 .01 4.062703 4.461175
Let's check the math. So simplicity and clarity, please allow me to round and use sample 5.
DB_TIME_S = DB_CPU_S + WAIT_TIME_S
4.5 = 4.1 + 0
The DB Time shown above is kind of close... 10% off. (4.5 vs 4.1) But there is for sure timing error in my collection sript. I take the position, this is close enough to demonstrate the time statistic relationships. Now look below.
Elapsed Time (EL_VSQL_S)  = DB_TIME_S
131.7 != 4.5
Woah! What happened here? (131.7 vs 4.5) Actually, everything is OK (so far aways) because the DB Time is related to the session (Session ID 16), whereas the elapsed time is ALL the DB Time for ALL the processes involved in the SQL statement. Since parallel query is involved, resulting in four additional sessions (1 coordinator, 3 slaves) we would expect the elapsed time to be greater than the DB Time. Now let's look at the wall time.
Wall Time (WALL_S) = ( Elapsed Time (EL_VSQL_S) / Parallelism ) + overhead
46.1 = ( 131.8 / 3 ) + 2.2
Nice! Clearly the effective parallelism is greater than 3 because there is some overhead (2.2). But the numbers makes sense because:

1. The wall time is less than the elapsed time because parallel query is involved.

2. The wall time is close to the elapsed time divided by the parallelism. And we can even see the parallelism overhead.

So it looks like our time math is correct!


Reality And The AWR Or Statspack Report
This is really important. In the SQL Statement section of any AWR or Statspack Report, you will see the total elapsed time over the snapshot interval and perhaps the average SQL ID elapsed time per execution. So what is the wall time? What are users experiencing? The short answer is, we do NOT have enough information.

To know the wall time, we need to know the parallelism situation. If you are NOT using parallel query, than based on the time math demonstrated above, the elapsed time per execution will be close to what the user experiencing (unless there is an issue outside of Oracle). However, if parallelism is involved, you can expect the wall time (i.e, user's experience) to be much less than the elapsed time per execution shown in the AWR or Statspack report.

Another way of looking at this is: If a user is reporting a query is taking 10 seconds, but the average elapsed time is showing as as 60 seconds, parallel query is probably involved. Also, as I mentioned above, never forget the average value is not always the typical value. (More? Check out my video seminar entitled, Using Skewed Data To Your Advantage HERE.)

Thanks for reading!

Craig.
Categories: DBA Blogs