Bobby Durrett's DBA Blog

Subscribe to Bobby Durrett's DBA Blog feed
Oracle database performance
Updated: 2 hours 44 min ago

Result cache latch contention

Tue, 2017-08-08 11:29

I recently saw a dramatic example of result cache latch contention. I had just upgraded a database to 11.2.0.4 and several hours later processing pretty much came to a halt.

Of course I am telling you the end before I tell you the beginning. It all started the morning of July 22nd, 2017, a few weeks back. We had worked for a couple of months on an 11.2.0.4 upgrade. I mentioned some issues with the same upgrade in my earlier post. I spent several hours Saturday morning upgrading the database and it all went very smoothly. Then we kicked off the normal batch processing and things seemed fine. Around 1 am the next morning I get paged about an issue. All of the batch processing had ground to a halt on the upgraded database. Needless to say, I was freaking out and afraid that we would have to back out the upgrade. This would have been ugly and time-consuming.

At first I  focused on the top SQL statements in the AWR report. Here are the top few from that morning:

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text 17,983.20 452,555 0.04 43.98 10.64 0.00 4yw0zv7ty2t47 SQL*Plus SELECT SLS_EXCL_RSN FROM ( SEL… 3,643.96 0 8.91 13.68 0.19 2fxnrcamtbcc2 SQL*Plus DECLARE return_code number := … 3,637.60 0 8.90 3.67 0.06 18pdd22fh15dc SQL*Plus INSERT /*+ APPEND PARALLEL(TGT…

The top query dominates the others because it takes 43.98% of the total run time so it made sense that this query was the problem. I expected that some queries would change plan with the upgrade. I had used SQL Profiles to lock in other plans that I knew were problematic but I would not be surprised to see new ones. But, looking at the top SQL, sql id 4yw0zv7ty2t47, the plan was the same as before the upgrade. Here is some execution history of the query before and after the July 22 upgrade:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms
------------- --------------- ------------------------- ---------------- ------------------
4yw0zv7ty2t47       848116227 15-JUL-17 11.00.28.230 PM           192430         .822087045
4yw0zv7ty2t47       848116227 16-JUL-17 02.00.24.144 AM          2522505         .691503149
4yw0zv7ty2t47       848116227 16-JUL-17 03.00.42.580 AM          2787002         .793723611
4yw0zv7ty2t47       848116227 22-JUL-17 10.00.12.787 PM           220581         23.4686899
4yw0zv7ty2t47       848116227 22-JUL-17 11.00.30.251 PM           455887         20.1393904
4yw0zv7ty2t47       848116227 23-JUL-17 12.00.47.394 AM           445173         20.5407455
4yw0zv7ty2t47       848116227 23-JUL-17 01.00.05.711 AM           369333         28.3195315
4yw0zv7ty2t47       848116227 23-JUL-17 02.00.22.811 AM           465591         39.0232079
4yw0zv7ty2t47       848116227 23-JUL-17 03.00.40.758 AM           452555         39.7370594
4yw0zv7ty2t47       848116227 23-JUL-17 04.00.57.968 AM           458328         39.3421407
4yw0zv7ty2t47       848116227 23-JUL-17 04.09.32.144 AM            10055         39.1518787
4yw0zv7ty2t47       848116227 23-JUL-17 04.11.58.484 AM            18507         39.6002968
4yw0zv7ty2t47       848116227 23-JUL-17 04.15.24.661 AM             5215         39.4672715
4yw0zv7ty2t47       848116227 23-JUL-17 04.16.30.441 AM             8542         39.1123689
4yw0zv7ty2t47       848116227 23-JUL-17 05.00.15.147 AM           321635         39.9827259

So, I stared at this for a couple of hours, getting more and more stressed because my cure-all SQL Profile was not going to help in this situation. The plan had not changed. I could not think, in my sleep deprived state, of a way to resolve this issue that morning. Then, after a quick prayer for help, I noticed the %CPU and %IO columns in the query’s line in the AWR report. 10% CPU, 0% I/O. The query was spending 90% of its time waiting and doing nothing. I had noticed some latch waits on the AWR report but I thought that the CPU on the system was just busy so we had some latch waits.

These were the top three events:

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class latch free 3,658,537 26.9K 7 65.8 Other DB CPU 4749 11.6 db file scattered read 71,549 142.1 2 .3 User I/O

But, now finally I looked back at the latch waits. I went down to the latch section of the AWR report and the Result Cache: RC Latch latch showed up big:

Latch Sleep Breakdown Latch Name Get Requests Misses Sleeps Spin Gets Result Cache: RC Latch 6,742,176 4,142,777 4,143,709 669,430 row cache objects 5,804,568 169,324 6,087 163,272 cache buffers chains 107,393,594 36,532 5,859 30,976

I had noticed the RESULT_CACHE hint in some queries on our test database but never saw the latch contention. Here is the first part of the problem query with the RESULT_CACHE hint.

SQL_ID 4yw0zv7ty2t47
--------------------
SELECT SLS_EXCL_RSN FROM ( SELECT /*+ PARALLEL RESULT_CACHE */ DISTINCT

Here is part of the plan with the result cache step:

Plan hash value: 848116227
------------------------------------------
| Id  | Operation                        |
------------------------------------------
|  11 |  COUNT STOPKEY                   |
|  12 |   VIEW                           |
|  13 |    RESULT CACHE                  |
------------------------------------------

Early in the morning I got the idea of disabling the result cache. I ran this command:

alter system set result_cache_max_size=0 scope=both;

All of the running queries immediately died with this error:

ORA-00600: internal error code, arguments: [qesrcDO_AddRO],…

But, when the jobs were rerun they quickly went to completion. I checked the query performance before and after disabling the result cache and before the upgrade as well and found that post upgrade without the result cache the query ran ten times faster than before.

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms
------------- --------------- ------------------------- ---------------- ------------------
4yw0zv7ty2t47       848116227 15-JUL-17 11.00.28.230 PM           192430         .822087045
4yw0zv7ty2t47       848116227 16-JUL-17 02.00.24.144 AM          2522505         .691503149
4yw0zv7ty2t47       848116227 16-JUL-17 03.00.42.580 AM          2787002         .793723611
4yw0zv7ty2t47       848116227 22-JUL-17 10.00.12.787 PM           220581         23.4686899
4yw0zv7ty2t47       848116227 22-JUL-17 11.00.30.251 PM           455887         20.1393904
4yw0zv7ty2t47       848116227 23-JUL-17 12.00.47.394 AM           445173         20.5407455
4yw0zv7ty2t47       848116227 23-JUL-17 01.00.05.711 AM           369333         28.3195315
4yw0zv7ty2t47       848116227 23-JUL-17 02.00.22.811 AM           465591         39.0232079
4yw0zv7ty2t47       848116227 23-JUL-17 03.00.40.758 AM           452555         39.7370594
4yw0zv7ty2t47       848116227 23-JUL-17 04.00.57.968 AM           458328         39.3421407
4yw0zv7ty2t47       848116227 23-JUL-17 04.09.32.144 AM            10055         39.1518787
4yw0zv7ty2t47       848116227 23-JUL-17 04.11.58.484 AM            18507         39.6002968
4yw0zv7ty2t47       848116227 23-JUL-17 04.15.24.661 AM             5215         39.4672715
4yw0zv7ty2t47       848116227 23-JUL-17 04.16.30.441 AM             8542         39.1123689
4yw0zv7ty2t47       848116227 23-JUL-17 05.00.15.147 AM           321635         39.9827259
4yw0zv7ty2t47       848116227 23-JUL-17 05.20.37.524 AM           329457          17.895581
4yw0zv7ty2t47       848116227 23-JUL-17 05.21.15.363 AM           205154         .050141323
4yw0zv7ty2t47       848116227 23-JUL-17 05.25.07.159 AM          1023657         .049949389

If you look through the output you will see that pre-upgrade on July 16th the query averaged about .69 to .79 milliseconds. During the latch contention on July 23rd it averaged about 39 milliseconds, a lot worse. But, after disabling the result cache it averaged .05 milliseconds which is at least 10 times faster than with the result cache before the upgrade.

So, it seems that the result cache hint on this query has always slowed it down. But, the upgrade and the load afterward caused some intense latch contention that we had not seen before. But, it is very cool that disabling the result cache actually made the query faster than it has been in the past.

I don’t think that it makes sense to put a RESULT_CACHE hint in a query that will run in .05 milliseconds without it. The overhead of the result cache made the query run 10 times slower at least. Something about the upgrade resulted in latch contention that caused a system wide problem, but disabling the result cache made the query run faster than it ever had. We could have avoided this problem by leaving off the RESULT_CACHE hint, but it was nice that I could resolve the problem quickly by disabling the result cache using a parameter change.

Bobby

Categories: DBA Blogs

Two configuration changes with 11.2.0.4 upgrade

Mon, 2017-08-07 17:58

A couple of weeks ago I upgraded a major production database from 11.1.0.7 to 11.2.0.4. Our developers kept hitting one ugly bug after another on the minimally patched 11.1.0.7 so I wanted to get them on the most patched up version of Oracle 11. This is on HP-UX Itanium 11.31. I made two key configuration changes that I want to discuss here. I changed the database to use direct I/O and I set a parameter so that the database would not use direct path reads for most table and partition scans. The main point of this blog post is that one change required the other. I moved to direct I/O to improve checkpoint performance but that slowed down queries that repeatedly scanned segments using direct path reads. I set an underscore parameter that prevented direct path reads except on very large table scans and that sped the problem queries back up to normal.

In our testing a coworker ran a massive number of updating jobs at the same time. A truncate took about an hour during this excessive load. I ran a simple test script to truncate a small table during this load and it took over a minute. The truncate spent almost all of its time on these two waits:

  1. local write wait – 61%
  2. enq: RO – fast object reuse – 38%

This was on a Delphix clone of production so the NFS file systems were already forcing the database to use direct I/O. But, I had to max out the database writer processes to get the needed checkpoint performance. I based this decision on my experience with checkpoint performance on another database which I documented in this post: url. I set these parameters:

  1. filesystemio_options=directIO
  2. db_writer_processes=36
  3. dbwr_io_slaves=0

We use these settings on a large data warehouse staging database on 11.2.0.4 and HP-UX 11.31 to tune checkpoint performance there. So, we have tested this configuration in production for several years.

After changing these parameters the truncates ran fast under the same heavy updating load.

We remounted the database file systems with mincache=direct, convosync=direct options on our non-Delphix physical databases when we moved them to direct I/O.

For a long time I have seen issues with slowness of full partition scans on our Delphix clones of the production database that this post is about. Here is a post about this issue: url. I remember hearing about an undocumented underscore parameter that you could use to prevent direct path reads and always meant to look into it but kept holding back. I didn’t want to set the underscore parameter on my Delphix copies of production and not set it on my physical production database. That would make our test database different from production and that could lead to invalid testing. But, an upgrade was a great time to put the parameter in both on Delphix and in production. This was the parameter that I put in:

_small_table_threshold=1000000

I read several good blog posts about this parameter and other ways to deal with direct path reads in Oracle 11. Evidently some behavior changed in Oracle 11 that caused full scans to bypass the buffer cache in more situations, using direct path reads. Some post talked about DBA’s just bumping up the value for _small_table_threshold when they upgraded to Oracle 11 so I tried it and testing proved it out. Here is an earlier post about my work with this parameter: url.

So, the upgrade has given me a chance to not only move our database to a more patched up, and hopefully stable, release but it also was a chance to make two key configuration changes. Without direct I/O our production database was using the Unix file system cache to cache the blocks that were being repeatedly scanned using direct path reads. But, to switch to direct I/O I had to end the direct path reads so that the database buffer cache would be used to cache the scanned blocks. Direct I/O and the 36 database writer processes gave us great checkpoint performance. Disabling direct path read kept the queries that had depended on the Unix filesystem cache running just as fast by allowing them to use the database buffer cache.

Bobby

Categories: DBA Blogs

September AZORA meeting

Wed, 2017-08-02 10:30

The September AZORA meeting has a great lineup of speakers. AZORA is the Arizona Oracle User Group. Republic Services is providing us with an excellent place to meet off of the 101. I’m looking forward to it. Check it out if you are in the Phoenix area.

Bobby

#meetup_oembed .mu_clearfix:after { visibility: hidden; display: block; font-size: 0; content: " "; clear: both; height: 0; }* html #meetup_oembed .mu_clearfix, *:first-child+html #meetup_oembed .mu_clearfix { zoom: 1; }#meetup_oembed { background:#eee;border:1px solid #ccc;padding:10px;-moz-border-radius:3px;-webkit-border-radius:3px;border-radius:3px;margin:0; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; }#meetup_oembed h3 { font-weight:normal; margin:0 0 10px; padding:0; line-height:26px; font-family:Georgia,Palatino,serif; font-size:24px }#meetup_oembed p { margin: 0 0 10px; padding:0; line-height:16px; }#meetup_oembed img { border:none; margin:0; padding:0; }#meetup_oembed a, #meetup_oembed a:visited, #meetup_oembed a:link { color: #1B76B3; text-decoration: none; cursor: hand; cursor: pointer; }#meetup_oembed a:hover { color: #1B76B3; text-decoration: underline; }#meetup_oembed a.mu_button { font-size:14px; -moz-border-radius:3px;-webkit-border-radius:3px;border-radius:3px;border:2px solid #A7241D;color:white!important;text-decoration:none;background-color: #CA3E47; background-image: -moz-linear-gradient(top, #ca3e47, #a8252e); background-image: -webkit-gradient(linear, left bottom, left top, color-stop(0, #a8252e), color-stop(1, #ca3e47));disvplay:inline-block;padding:5px 10px; }#meetup_oembed a.mu_button:hover { color: #fff!important; text-decoration: none; }#meetup_oembed .photo { width:50px; height:50px; overflow:hidden;background:#ccc;float:left;margin:0 5px 0 0;text-align:center;padding:1px; }#meetup_oembed .photo img { height:50px }#meetup_oembed .number { font-size:18px; }#meetup_oembed .thing { text-transform: uppercase; color: #555; }
Starting the fall with a bang! Rich Niemiec, Jerry Ward, Benoit Chaffanjon

Thursday, Sep 21, 2017, 10:00 AM

Republic Services – 2nd Floor Conference Rooms
14400 N 87th St (AZ101 & Raintree) Scottsdale, AZ

6 AZORAS Attending

Wow! We’re coming off of the summer break with a bang! We’ll have three speakers: Rich Niemiec and Jerry Ward from Viscosity, followed by Benoit Chaffanjon, Oracle VP of Enterprise Solutions and Chief Architect.We’ll also be starting earlier in the day at 10:00 am to change things up!Save the date! Thursday, September 21st, 10am-2:30pmFor the la…

Check out this Meetup →

 

Categories: DBA Blogs

Updated PythonDBAGraphs to work from IDLE

Fri, 2017-07-21 18:19

I switched from Enthought Canopy to IDLE for Python development when I got my new corporate laptop a few weeks back. Yesterday I realized that I was unable to run a Matplotlib graph from IDLE in my current configuration. Also, I could not find a way to pass command line arguments into my Pyth0nDBAGraphs scripts from IDLE. So, I put in a couple of fixes including an update to the README explaining how to pass arguments into my scripts when using IDLE. This describes the problem I was having running Matplotlib graphs in IDLE: stackoverflow article.

I only use IDLE for development. I run my PythonDBAGraphs from the Windows command prompt when I am using them for my database work. Also, I use TextPad and the command line version of Python for development as well as graphical tools like IDLE or Canopy. But, I wanted to use IDLE for development of graphs so I came up with these fixes.

Bobby

Categories: DBA Blogs

Read Oracle Database 12.2 New Features Manual

Thu, 2017-06-29 16:53

I just finished reading the Oracle database 12.2 new features manual. I have postponed looking at 12.2 until now because for a long time 12.2 was not available for download even though it was available in Oracle’s cloud. Once the download became available I installed 12.2 in a test virtual machine but did not get any further than that. But, the first quarterly update of 12.2 is supposed to come out soon so I thought that I would at least read about the new features to start learning about it.

I wrote earlier about reading the Snowflake documentation. Reading the Oracle new features guide made me wonder about cases where both products are working on the same goals. Oracle 12.2 seems to have some new features for data analysis or analytics that relate to approximate results. Maybe these features are similar to Snowflake’s cardinality approximation routines? There are also a lot of new features related to JSON, which is a key Snowflake reality.

There are In-Memory database enhancements which is no surprise since this is a newer feature. Similarly, there are a ton of CDB and PDB database feature enhancements since multitenant was a major new feature in 12.1. Of course multitenant connects to Oracle’s cloud strategy so it is no surprise that Oracle included cloud enabling features in 12.2.

There are a ton of changes to Oracle features that I don’t use like Data Guard. I’ve seen some GGS but never used Data Guard. I recently saw a presentation at a user group meeting, I think it was at AZORA, where a presenter discussed the new 12c features of Data Guard. That user meeting plus the 12.2 new features manual together make me think that Oracle Data Guard has a massive amount of features that I am not aware of and not using.

I love the way Oracle keeps expanding their online operations. I think that 12.2 has a lot of features that would really help data warehouses. I’m also really excited about 12.2’s improvements in cross-platform migration. I have a couple of large databases that I need to migrate to Linux from HP-UX. 12.2 may have features that will help with these migrations. I’m really excited about the Data Pump parallel import of Metadata. It looks like our cross-platform database moves would need to load tens of thousands of objects. It would be awesome to do that metadata load in parallel. There are also a lot of enhancements related to Exadata. It would be nice to run a data warehouse on Exadata on 12.2.

Oracle’s sharding features make me think of Oracle trying to compete with other databases including cloud and open source systems. Oracle has everything but the kitchen sink all under one database. But, maybe less expensive databases with a subset of the features that Oracle has will work well in particular situations. It’s hard to know. I’ve been working with Oracle for 20 plus years so I think it helps to use a tool that I’m familiar with and there is no doubt that Oracle has many great features. But I wonder if Oracle is so expensive for some use cases that it makes sense to use less expensive and less capable software that fits specific needs.

I’m sorry if this post is kind of stream of consciousness writing. I reviewed the parts of the manual where I underlined something or made notes. I was pretty negative about Oracle 12.2 when Oracle announced that it would only be available in the cloud. I had a free cloud account that I could have used to mess with 12.2 but I knew that we were not going to use it until I could download it. The fact that 12.2 is available for download and that Oracle is starting to release quarterly patch updates motivated me to start looking at it. I have only just begun. I only read the new features manual. But, I am a lot more positive about 12.2 than before. It would be great if the cross-platform features were all that I hope that they are. That would help meet a real need. Anyway, I enjoyed reading about the new features in 12.2. It makes me want to get going with it to see if it can really help us.

Bobby

Categories: DBA Blogs

SQR with 077 umask creates file with 611 permissions

Fri, 2017-06-23 11:18

I ran across this strange situation. An SQR opened a new data file for output and created the file with 611 permissions. We needed group read access so this caused a problem. I knew that our login script for the PeopleSoft Unix user set umask to 022 but that without running the login scripts the mask is 077. So, my first thought was that we had started the process scheduler without running the login scripts and the mask was 077. But, why would the file that the SQR created be 611 permissions and not 600? The 077 mask should get rid of all the group and others bits. I built a simple test case to show that the SQR creates the file with 611 permissions with a 077 mask.

Here is the test SQR:

begin-report
  Let $Unix_Cmd = 'umask'
  Call System Using $Unix_Cmd #Status
  Let $Unix_Cmd = 'rm /tmp/bobby/bobby.txt'
  Call System Using $Unix_Cmd #Status
  let $testfile = '/tmp/bobby/bobby.txt'
  open $testfile as 1 for-writing  record=1500:fixed
  close 1
  Let $Unix_Cmd = 'ls -l /tmp/bobby/bobby.txt'
  Call System Using $Unix_Cmd #Status
end-report

Here is its output:

SQR for PeopleSoft V8.53.05
077
-rw---x--x   1 psoft      psoft            0 Jun 23 10:54 /tmp/bobby/bobby.txt

Notice the execute bits for group and others.  Why were they not masked out? Also, the default permissions to create a file is 644. So, creating a new file should not set the execute bits at all no matter what mask you are using.

I created a Korn shell script to do the same thing as the SQR:

umask
rm /tmp/bobby/bobby.txt
touch /tmp/bobby/bobby.txt
ls -l /tmp/bobby/bobby.txt

Here is its output:

077
-rw-------   1 psoft      psoft            0 Jun 23 10:58 /tmp/bobby/bobby.txt

Notice that there are no group and others bits which is what I expected with a 077 mask. I tried searching the Internet for SQR and 611 permissions but could not find anything.

As it turns out, we did start the process scheduler with umask 077 so I just modified the script that started it to set umask 022 and that resolved the problem. Here is the output from my test SQR with umask 022:

SQR for PeopleSoft V8.53.05
022
-rw-r--r--   1 psoft      psoft            0 Jun 23 11:01 /tmp/bobby/bobby.txt

This is what we wanted and of course the Korn shell script does the same thing as it should.

022
-rw-r--r--   1 psoft      psoft            0 Jun 23 11:02 /tmp/bobby/bobby.txt

Seems very odd to me. Anyway, I hope that this post helps someone.

This was on HP-UX 11.31 and PeopleTools 8.53.05

Bobby

Categories: DBA Blogs

Unrolling loop speeds up program

Thu, 2017-06-22 15:55

This is a follow-up to my earlier post about the assembly language book that I am working through. I have struggled to speed up a program using something that the book recommends, unrolling a loop. I think I have finally found an example where unrolling a loop speeds up a program so I wanted to share it.

I am working on Chapter 17 Exercise 2 of the book which asks you to write a program to find the longest common substring from two strings. I choose an inefficient and simple way to find the common substring and tried to speed the program up without changing the algorithm.

Here is the C version on GitHub: url

The core of the program is three loops. The outer loop tries each character in string 1 as the start of the substring. The middle loop tries each character in string 2 as the start of the substring. The inner loop advances through both strings until it finds the end of the common substring.

The C version ran in 27.2 seconds.

I built an assembly version that uses registers for most of the variables and it ran in about 11.7 seconds. It has the same three loops. Assembly register version: url

I tried to improve on the 11.7 seconds by unrolling each of the three loops. Unrolling the outer and inner loops resulted in no improvement in runtime. I was about to give up but finally decided to try unrolling the middle loop and it caused the program to run in 10.2 seconds. Very cool. Assembly unrolled middle loop version: url

I had to figure out how to use %rep, %assign, and how to have a label that I based on a nasm variable such as .skipif %+ i.

Kind of fun. I realize that this is off topic for my “DBA Blog” but it is something I’m playing with so I thought I would throw it out there. It doesn’t hurt a DBA to know some low-level computer science, even if you are not using it directly in your job. Anyway, it was a bit of a struggle to come up with an example that was faster with the loop unrolling and I think that I have found one.

Bobby

Categories: DBA Blogs

Modified PythonDBAGraphs to use datetime on X axis

Fri, 2017-06-16 18:55

I modified PythonDBAGraphs to use datetime objects on the X axis for all the reports except ashcpu.py. This lets you hover the mouse over a point and see the date and time for the point. Previously I was using text strings as labels with dates and times on the X axis and it was not obvious which point was for which date and time.

Here is a screenshot:

I have an example of a graph using datetime objects in an earlier post: post

I used two very helpful pages from the Matplotlib documentation: doc, example

I uninstalled cx_Oracle 6 and went back to version 5.3. This resolved the bug I mentioned in my earlier PythonDBAGraphs post. This enabled me to take out the TO_CHAR function calls that I had just added to work around the bug. The SQL queries looks simpler and easier to understand now.

I modified the README to include my new list of installed packages.

I feel like these changes will make the existing graphs more useful to me. Also, the code and SQL queries are cleaner now so it will be easier for me to create new graphs in the future.

Bobby

 

Categories: DBA Blogs

Pushed out new version of PythonDBAGraphs

Fri, 2017-06-02 17:03

I pushed out a new version of PythonDBAGraphs. I got a new laptop at work so I am setting it up with the software that I need to do my job. I decided that instead of reinstalling Enthought Canopy I would go with the current version of Python 3 and install the packages that I need one at a time. This gives me a cleaner install and I’m on the most current version of Python and the packages. I had installed Canopy for the two edX classes that I took so I wasn’t motivated to uninstall it on my old laptop. I use PythonDBAGraphs just about every day for my work and I didn’t want to whack it by uninstalling Canopy on my old laptop and trying to install something new. But, now I have my old laptop as a backup so I was able to take the time to install Python 3 and the current packages. I have updated the README with details of my new configuration.

I had to make a number of changes in the code. I had to change <> to !=. I had to change print xyz to print(xyz). Also, I think I ran across a bug in the new version of cx_Oracle. Here is an example. Assume you have an open connection and cursor. The cursor is cur.

cur.execute('select to_char(blocks/7) from user_tables')
row = cur.fetchone()
print(row)

cur.execute('select blocks/7 from user_tables')
row = cur.fetchone()
print(row)

Here is the output:

('2.28571428571428571428571428571428571429',)
Traceback (most recent call last):
  File "bug.py", line 12, in 
    row = cur.fetchone()
ValueError: invalid literal for int() with base 10: '2.28571428571428571428571428571428571429'

Strange. The workaround is to take any numbers with fractional values and run to_char() on them. I made a number of changes in the code to work around this bug/feature.

Anyway, I’m glad that I moved to Python 3 and that it enables me to use pip to install just the packages I want. Hopefully this update will make PythonDBAGraphs more usable for others since not everyone has Canopy.

Bobby

Categories: DBA Blogs

X86-64 Assembly Book

Wed, 2017-05-24 15:32

I have written earlier blog posts about my diversion from studying Oracle to studying computer science. Here are some relevant posts: url1,url2,url3,url4. After finishing the math for computer science online class and book that I was working on I stared working through a book about assembly language programming. I bought the book in a frenzy of enthusiasm about studying computer science for fun. But then I had to decide if I was going to work through the assembly book now and delay moving on to the algorithms class using Python that I had intended to do next. I intended to use the math that I studied to prepare me for the algorithms class. But, assembly is a nice low-level hardware oriented thing to study and a contrast from the math, computer science theory, and higher level Python scripting language. So, I decided to delay my study of algorithms and work on assembly.  I’m working on the last exercise of the 16th chapter out of 19 in the book and thought I would start this blog post to document my experience so that others might benefit from it. The book is Ray Seyfarth’s  “Introduction to 64 Bit Assembly Language Programming for Linux and OS X“. I have saved my work on the exercises on GitHub: repository.

I want to let people know what type of environment and tools that I used so they can compare notes with my experience if they decide to work through the book. I started out on an Oracle Enterprise Linux 7 virtual machine running under VirtualBox on my laptop. Oracle’s Linux is a version of Red Hat Linux. I believe that I had to compile YASM and the author’s ebe tool. It has been a while but I think I had to search around a bit to get the right packages so that they would compile and I had some parts of ebe that never worked correctly. Starting with Chapter 9 Exercise 2 I switched from YASM, the assembler recommended by the author of the book to NASM, a more commonly used assembler. I switched because I hit a bug in YASM. So, chapter 9 exercise 1 and earlier were all YASM. Also, after the early lessons I moved from the GUI ebe debugger to the command line gdb debugger. I wanted to get more familiar with gdb anyway since I use Linux for my job and might need gdb to help resolve problems. After I got a new laptop I switched to using Centos 7 on VirtualBox. I was able to install nasm and gdb using yum from standard repositories and did not have to do any manual compilation of development tools in my new environment. So, if you choose to work through the book you could go the nasm and gdb route that I ended up with if you have challenges installing and using ebe and yasm. There are some minor differences between nasm and yasm but they are pretty easy to figure out using the nasm manual.

There are connections between x86-64 assembly programming on Linux and my job working with Oracle databases. At work, 64 bit Linux virtual machines are our standard Oracle database platform. They are also the building blocks of the cloud. You see a lot of 64 bit Linux on Amazon Web Services, for example. So, I’m really kind of doing assembly language for fun since it is so impractical as a programming language, but at the same time I’m doing it on the platform that I use at work. Maybe when I’m looking at a dump in an Oracle trace file on Linux it will help me to know all the registers. If I’m working with some open source database like MySQL it can’t hurt to know how to debug in gdb and compile with gcc.

The assembly language book also connected with my passion for performance tuning. The author had some interesting things to say about performance. He kind of discouraged people from thinking that they could easily improve upon the performance of the GCC C compiler with all of its optimizations. It was interesting to think about the benefits of SIMD and how you might write programs to work better with pipelining and the CPU’s cache. It was kind of like Oracle performance tuning except you were looking at just CPU and lower level factors. But you still have tests to prove out your assumptions and you have to try to build tests to show that what you think is so will really hold up. Chapter 16 Exercise 1 is a good example of SIMD improving performance. I started with a simple C version that ran in 3.538 seconds. An AVX version of the subroutine did 8 floating point operations at a time and ran the same function in 2.1057 seconds. Here are some of the AVX instructions just for fun:

    vmovups ymm0,[x_buffer]      ; load 8 x[i] values
    vmovups ymm1,[rsi+r10*4]     ; load 8 x[j] values
    vsubps ymm0,ymm1             ; do 8 x[i]-x[j] ops
    vmulps ymm0,ymm0             ; square difference

Generally, x86-64 assembly ended up feeling a lot like C. The book has you use a variety of calls from the C library so in the later chapters the assembly programs had calls to a lot of the functions that you use in C such as printf, scanf, strlen, strcmp, and malloc. Like C it was common to get segmentation faults without a lot to go on about what caused it. Still the back trace (bt) command in gdb leads you right to the instruction that got the error so it some ways it was easier to diagnose segmentation faults in assembly than I remember it being in C. It brought back memories of taking C in college and puzzling over segmentation faults and bad pointers. It also made me think of the time in a previous job when I progressed from C to C++ to Java. I came out of school having done a fair amount of C programming. Then I read up on C++ and object-oriented programming. But C++ still had the segmentation faults. Then I found Java and thought it was great because it gave you more meaningful error messages than segmentation fault. Now, I have embraced Python recently because of the edX classes that I took and because of the ways I have used it at work. Working with assembly has kind of taking me back down the chain of ease of use from Python to C to assembly. I can’t see using C or assembly for every day use but most of the software that we use is written in C so it seems reasonable to have some familiarity with C and the lower level assembly that lies beneath it.

Anyway, I have three more chapters to go but thought I would put out this update now while I am thinking about it. I may tweak this post later or put out a follow-up, but I hope that it is useful to someone. If you feel inclined to study 64 bit x86 assembly on Linux I think that you will find the Ray Seyfarth book a good resource for you. If you want to talk to me about my experience feel free to leave at comment on this post or send an email.

Bobby

Categories: DBA Blogs

Finished reading the Snowflake database documentation

Tue, 2017-05-16 13:36

I just finished reading the Snowflake database documentation and I thought I would blog about my impressions. I have not spent a lot of time using Snowflake so I can not draw from experience with the product. But, I read all the online documentation so I think I can summarize some of the major themes that I noticed. Also, I have read a lot of Oracle database documentation in the past so I can compare Oracle’s documentation to Snowflake’s. Prior to reading the online documentation I also read their journal article which has a lot of great technical details about the internals. So, I intend to include things from both the article and the documentation.

My observations fall into these major categories:

  • Use of Amazon Web Services instead of dedicated hardware for data warehouse
  • Ease of use for people who are not database administrators
  • Use of S3 for storage with partitioning and columnar storage
  • Lots of documentation focused on loading and unloading data
  • Lots of things about storing and using JSON in the database
  • Limited implementation of standard SQL – no CREATE INDEX
  • Computer science terms – from the development team instead of marketing?
  • Role of database administrator – understand architecture and tuning implications
  • Focus on reporting usage of resources for billing
  • JavaScript or SQL stored functions
  • Down side to parallelism – high consumption of resources
  • High end developers but still a new product
  • Maybe specialized purpose – not general purpose database

First let me say that it is very cool how the Snowflake engineers designed a data warehouse database from scratch using Amazon Web Services (AWS). I have worked on a couple of different generations of Exadata as well as HP’s Neoview data warehouse appliance and a large Oracle RAC based data warehouse so I have experience with big, expensive, on site data warehouse hardware. But Snowflake is all in Amazon’s cloud so you don’t have to shell out a lot of money up front to use it. It makes me think that I should try to come up with some clever use of AWS to get rich and famous. All of the hardware is there waiting for me to exploit it and you can start small and then add hardware as needed. So, to me Snowflake is a very neat example of some smart people making use of the cloud. Here is a pretty good page about Snowflake’s architecture and AWS: url

You would not think that I would be happy about a database product that does not need database administrators since I have been an Oracle database administrator for over 20 years. But, it is interesting how Snowflake takes some tasks that DBAs would do working with an on site Oracle database and makes them easy enough for a less technical person to do them.  There is no software to install because Snowflake is web-based. Creating a database is a matter of pointing and clicking in their easy to use web interface. Non-technical users can spin up a group of virtual machines with enormous CPU and memory capacity in minutes. You do not setup backup and recovery. Snowflake comes with a couple of built-in recovery methods that are automatically available. Also, I think that some of the redundancy built into AWS helps with recovery. So, you don’t have Oracle DBA tasks like installing database software, creating databases, choosing hardware, setting up memory settings, doing RMAN and datapump backups. So, my impression is that they did a good job making Snowflake easier to manage. Here is a document about their built-in backup and recovery: url

Now I get to the first negative about Snowflake. It stores the data in AWS’s S3 storage in small partitions and a columnar data format. I first saw this in the journal article and the documentation reinforced the impression: url1,url2. I’ve used S3 just enough to upload a small file to it and load the data into Snowflake. I think that S3 is AWS’s form of shared filesystem. But, I keep thinking that S3 is too slow for database storage. I’m used to solid state disk storage with 1 millisecond reads and 200 microsecond reads across a SAN network from a storage device with a large cache of high-speed memory. Maybe S3 is faster than I think but I would think that locally attached SSD or SSD over a SAN with a big cache would be faster. Snowflake seems to get around this problem by having SSD and memory caches in their compute nodes. They call clusters of compute nodes warehouses, which I think is confusing terminology. But from the limited query testing I did and from the reading I think that Snowflake gets around S3’s slowness with caching. Caching is great for a read only system. But what about a system with a lot of small transactions? I’ve seen Snowflake do very well with some queries against some large data sets. But, I wonder what the down side is to their use of S3. Also, Snowflake stores the data in columnar format which might not work well for lots of small insert, update, and delete transactions.

I thought it was weird that out of the relatively small amount of documentation Snowflake devoted a lot of it to loading and unloading data. I have read a lot of Oracle documentation. I read the 12c concepts manual and several other manuals while studying for my OCP 12c certification. So, I know that compared to Oracle’s documentation Snowflake’s is small. But, I kept seeing one thing after another about how to load data. Here are some pages: url1,url2. I assume that their data load/unload statements are not part of the SQL standard so maybe they de-emphasized documenting normal SQL constructs and focused on their custom syntax. Also, they can’t make any money until their customers get their data loaded so maybe loading data is a business priority for Snowflake. I’ve uploaded a small amount of data so I’m a little familiar with how it works. But, generally, the data movement docs are pretty hard to follow. It is kind of weird. The web interface is so nice and easy to use but the upload and download syntax seems ugly. Maybe that is why they have some much documentation about it?

Snowflake also seems to have a disproportionate amount of documentation about using JSON in the database. Is this a SQL database or not? I’m sure that there are Oracle manuals about using JSON and of course there are other databases that combine SQL and JSON but out of the relatively small Snowflake documentation set there was a fair amount of JSON. At least, that is my impression reading through the docs. Maybe they have customers with a lot of JSON data from various web sources and they want to load it straight into the database instead of extracting information and putting it into normal SQL tables. Here is an example JSON doc page: url

Snowflake seems to have based their product on a SQL standard but they did not seem to fully implement it. For one thing there is no CREATE INDEX statement. Uggh. The lack of indexes reminds me strongly of Exadata. When we first got on Exadata they recommended dropping your indexes and using Smart Scans instead. But, it isn’t hard to build a query on Exadata that runs much faster with a simple index. If you are looking up a single row with a unique key a standard btree index with a sequential, i.e. non-parallel, query is pretty fast. The lack of CREATE INDEX combined with the use of S3 and columnar organization of the data makes me think that Snowflake would not be great for record at a time queries and updates. Of course, an Oracle database excels at record at a time processing so I can’t help thinking that Snowflake won’t replace Oracle with its current architecture. Here is a page listing all the things that you can create, not including index: url

Snowflake sprinkled their article and documentation with some computer science terms. I don’t recall seeing these types of things in Oracle’s documentation. For example, they have a fair amount of documentation about HyperLogLog. What in the world? HyperLogLog is some fancy algorithm for estimating the number of rows in a large table without reading every row. I guess Oracle has various algorithms under the covers to estimate cardinality. But they don’t spell out the computer science term for it. At least that’s my impression. And the point of this blog post is to give my impressions and not to present some rigorous proof through extensive testing. As a reader of Oracle manuals I just got a different feel from Snowflake’s documentation. Maybe a little more technical in its presentation than Oracle’s. It seems that Snowflake has some very high-end software engineers with a lot of specialized computer science knowledge. Maybe some of that leaks out into the documentation. Another example, their random function makes reference to the name of the underlying algorithm: url. Contrast this with Oracle’s doc: url. Oracle just tells you how to use it. Snowflake tells you the algorithm name. Maybe Snowflake wants to impress us with their computer science knowledge?

Reading the Snowflake docs made me think about the role of a database administrator with Snowflake. Is there a role? Of course, since I have been an Oracle DBA for over 20 years I have a vested interest in keeping my job. But, it’s not like Oracle is going away. There are a bazillion Oracle systems out there and if all the new students coming into the work force decide to shy away from Oracle that leaves more for me to support the rest of my career. But, I’m not in love with Oracle or even SQL databases or database technology. Working with Oracle and especially in performance tuning has given me a way to use my computer science background and Oracle has challenged me to learn new things and solve difficult problems. I could move away from Oracle into other areas where I could use computer science and work on interesting and challenging problems. I can see using my computer science, performance tuning, and technical problem solving skills with Snowflake. Companies need people like myself who understand Oracle internals – or at least who are pursuing an understanding of it. Oracle is proprietary and complicated. Someone outside of Oracle probably can not know everything about it. It seems that people who understand Snowflake’s design may have a role to play. I don’t want to get off on a tangent but I think that people tend to overestimate what Oracle can do automatically. With large amounts of data and challenging requirements you need some human intervention by people who really understand what’s going on. I would think that the same would be true with Snowflake. You need people who understand why some queries are slow and how to improve their performance. There are not as many knobs to turn in Snowflake. Hardly any really. But there is clustering: url1,url2,url3. You also get to choose which columns fit into which tables and the order in which you load the data, like you can on any SQL database. Snowflake exposes execution plans and has execution statistics: url1,url2,url3. So, it seems that Snowflake has taken away a lot of the traditional DBA tasks but my impression is that there is still a role for someone who can dig into the internals and figure out how to make things go faster and help resolve problems.

Money is the thing. There are a lot of money related features in the Snowflake documentation. You need to know how much money you are spending and how to control your costs. I guess that it is inevitable with a web-based service that you need to have features related to billing. Couple examples: url1,url2

Snowflake has SQL and JavaScript based user defined functions. These seem more basic than Oracle’s PL/SQL. Here is a link: url

There are some interesting things about limiting the number of parallel queries that can run on a single Snowflake warehouse (compute cluster). I’ve done a fair amount of work on Oracle data warehouses with people running a bunch of parallel queries against large data sets. Parallelism is great because you can speed up a query by breaking its execution into pieces that the database can run at the same time. But, then each user that is running a parallel query can consume more resources than they could running serially. Snowflake has the same issues. They have built-in limits to how many queries can run against a warehouse to keep it from getting overloaded. These remind me of some of the Oracle init parameters related to parallel query execution. Some URLs: url1,url2,url3 In my opinion parallelism is not a silver bullet. It works great in proofs of concepts with a couple of users on your system. But then load up your system with users from all over your company and see how well it runs then. Of course, one nice thing about Snowflake is that you can easily increase your CPU and memory capacity as your needs grow. But it isn’t free. At some point it becomes worth it to make more efficient queries so that you don’t consume so many resources. At least, that’s my opinion based on what I’ve seen on Oracle data warehouses.

I’m not sure if I got this information from the article or the documentation or somewhere else. But I think of Snowflake as new. It seems to have some high-end engineers behind it who have worked for several years putting together a system that makes innovative use of AWS. The limited manual set, the technical terms in the docs, the journal article all make me think of a bunch of high-tech people working at a startup. A recent Twitter post said that Snowflake now has 500 customers. Not a lot in Oracle terms. So, Snowflake is new. Like any new product it has room to grow. My manager asked me to look into technical training for Snowflake. They don’t have any. So, that’s why I read the manuals. Plus, I’m just a manual reader by nature.

My impression from all of this reading is that Snowflake has a niche. Oracle tries to make their product all things to all people. It has every feature but the kitchen sink. They have made it bloated with one expensive add-on option after another. Snowflake is leaner and newer. I have no idea how much Snowflake costs, but assuming that it is reasonable I can see it having value if companies use it where it makes sense. But I think it would be a mistake to blindly start using Snowflake for every database need. You probably don’t want to build a high-end transactional system on top of it. Not without indexes! But it does seem pretty easy to get a data warehouse setup on Snowflake without all the time-consuming setup of an on premise data warehouse appliance like Exadata. I think you just need to prepare yourself for missing features and for some things not to work as well as they do on a more mature database like Oracle. Also, with a cloud model you are at the mercy of the vendor. In my experience employees have more skin in the game than outside vendors. So, you sacrifice some control and some commitment for ease of use. It is a form of outsourcing. But, outsourcing is fine if it meets your business needs. You just need to understand the pros and cons of using it.

To wrap up this very long blog post, I hope that I have been clear that I’m just putting out my impressions without a lot of testing to prove that I’m right. This post is trying to document my own thoughts about Snowflake based on the documentation and my experience with Oracle. There is a sense in which no one can say that I am wrong about the things that I have written as long as I present my honest thoughts. I’m sure that a number of things that I have written are wrong in the sense that testing and experience with the product would show that my first impressions from the manuals were wrong. For example, maybe I could build a transactional system and find that Snowflake works better than I thought. But, for now I’ve put out my feeling that it won’t work well and that’s just what I think. So, the post has a lot of opinion without a ton of proof. The links show things that I have observed so they form a type of evidence. But, with Oracle the documentation and reality don’t always match so it probably is the same with Snowflake. Still, I hope this dump of my brain’s thoughts about the Snowflake docs is helpful to someone. I’m happy to discuss this with others and would love any feedback about what I have written in this post.

Bobby

Categories: DBA Blogs

Submitted two abstracts for Oracle OpenWorld 2017

Wed, 2017-04-19 14:07

I submitted two abstracts for Oracle OpenWorld 2017. I have two talks that I have thought of putting together:

  • Python for the Oracle DBA
  • Toastmasters for the Oracle DBA

I want to do these talks because they describe two things that I have spent time on and that have been valuable to me.

I have given several recent talks about Delphix. Kyle Hailey let me use his slot at Oaktable World in 2015 which was at the same time as Oracle OpenWorld 2015. Right after that I got to speak at Delphix Sync which was a Delphix user event. More recently I did a Delphix user panel webinar.

So, I’ve done a lot of Delphix lately and that is because I have done a lot with Delphix in my work. But, I have also done a lot with Python and Toastmasters so that is why I’m planning to put together presentations about these two topics.

I probably go to one conference every two years so I’m not a frequent speaker, but I have a list of conferences that I am thinking about submitting these two talks to, hoping to speak at one. These conferences are competitive and I’ve seen that better people than me have trouble getting speaking slots at them. But here is my rough idea of what I want to submit the talks to:

I’ve never gone to RMOUG but I think it is in Denver so that is a short flight and I have heard good things.

Also, we have our on local AZORA group in Phoenix. Recently we have had some really good ACE Director/Oak Table type speakers, but I think they might like to have some local speakers as well so we will see if that will work out.

If all else fails I can give the talks at work. I need to start working on the five speeches in my Toastmasters “Technical Presentations” manual which is part of the Advanced Communication Series. I haven’t even cracked the book open, so I don’t know if it applies but it seems likely that I can use these two talks for a couple of the speech projects.

Anyway, I’ve taken the first steps towards giving my Python and Toastmasters speeches. Time will tell when these will actually be presented, but I know the value that I have received from Python and Toastmasters and I’m happy to try to put this information out there for others.

Bobby

Categories: DBA Blogs

How to find the object that caused ORA-08103 error

Mon, 2017-04-17 14:50

A developer told me that two package executions died with ORA-08103 errors and he didn’t know which object caused the errors.

I found two trace files that had the following contents:

*** SESSION ID:(865.1201) 2017-04-17 10:17:09.476
OBJD MISMATCH typ=6, seg.obj=21058339, diskobj=21058934, dsflg=100000, dsobj=21058339, tid=21058339, cls=1

*** SESSION ID:(595.1611) 2017-04-17 10:17:35.395
OBJD MISMATCH typ=6, seg.obj=21058340, diskobj=21058935, dsflg=100000, dsobj=21058340, tid=21058340, cls=1

Bug 13844883 on Oracle’s support site gave me the idea to look up the object id for the diskobj part of the trace as the current object id. So, I needed to look up 21058934 and 21058935. I used this query to find the objects:

select * from dba_objects where DATA_OBJECT_ID in
(21058934,
21058935);

This pointed to two index partitions that had been rebuilt while the package was running. I’m pretty sure this caused the ORA-08103 error. So, if you get an ORA-08103 error find diskobj in the trace file and look it up as DATA_OBJECT_ID in dba_objects.

Bobby

Categories: DBA Blogs

Optimizer bug fix makes a query run more than 3 times slower

Wed, 2017-04-12 15:39

I’m working on an 11.1.0.7 to 11.2.0.4 upgrade and found a handful of queries that run more than 3 times longer on 11.2.0.4 than 11.1.0.7. The data and optimizer statistics are very similar on the two test databases. I’m pretty sure that an optimizer bug fix caused this difference. So, the irony is that a fix to the optimizer that we get with the upgrade to the very stable 11.2.0.4 release is causing a 3x slowdown in the query.

For my testing I’m using the gather_plan_statistics hint and this query to dump out the plan after executing the query:

select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS’));

I used an outline hint to force the 11.2.0.4 plan to run under 11.1.0.7 and then I looked at the estimated and actual row counts to find a discrepancy. I found one table with estimated row counts that did not look correct on 11.1.0.7 but made sense on 11.2.0.4.

11.1.0.7

---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------
|* 29 |           TABLE ACCESS BY INDEX ROWID  | MY_TABLE                  |     17 |      1 |      0 |00:00:07.34 |   96306 |       |       |          |
|* 30 |            INDEX RANGE SCAN            | MY_TABLE_PK               |     17 |     16 |    102 |00:00:01.20 |   96255 |       |       |          |

11.2.0.4

---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
---------------------------------------------------------------------------------------------------------------------------------------------------------
|* 29 |           TABLE ACCESS BY INDEX ROWID  | MY_TABLE                  |     17 |      8 |      0 |00:00:07.44 |   96306 |       |       |          |
|* 30 |            INDEX RANGE SCAN            | MY_TABLE_PK               |     17 |     17 |    102 |00:00:01.22 |   96255 |       |       |          |

After the range scan in step 30 in the plan in 11.1.0.7 you have an estimate of 16 rows but the table access in step 29 has an estimate of only 1. In 11.2.0.4 the estimate for step 29 is 8 rows.  Given the optimizer statistics, the optimizer should have estimated 8 rows step 29 in 11.1.0.7. It appears that 11.2.0.4 fixed a bug like this.

Here are the predicates for step 29 in the plan:

29 – filter((“PCP”.”MY_FLAG”=’M’ OR “PCP”.”MY_FLAG”=’Y’))

So the column could have value M or Y. The optimizer statistics have 4 distinct values for the column and no histogram. So the optimizer should assume that 1/4 of the rows meet each criteria. So the optimizer should have estimated 1/4 + 1/4 = 1/2 of the rows from step 30 meet the criteria in step 29. So, 17/2 = 8, rounding down. But in 11.1.0.7 it seems that they multiplied the rows from step 30 by 1/4 two times making it 16*1/4*1/4 = 1. It seems that in 11.1.0.7 the optimizer multiplied by 1/4 twice instead of adding them and then multiplying. There is a known bug related to OR conditions in where clauses:

Bug 10623119 – wrong cardinality with ORs and columns with mostly nulls (Doc ID 10623119.8)

Our 11.2.0.4 database includes this bug fix but I don’t know if this fix caused the difference in behavior that I saw. It seems possible that it did.

The interesting thing is that the real row count for step 29 is 0. So, the pre-bug fix plan in 11.1.0.7 actually estimated the row count more accurately by accident. It estimated 1 and the real count was 0. The correct estimate should have been 8, but that is not as close to 0 as 1 is. I think we just happened to have a few queries where the bug resulted in a more accurate estimate than a properly functioning optimizer. But, I’m only looking at the queries whose performance is worse after the upgrade. There may have been other queries that performed better because of this bug fix.

I ended up passing this and a similar query back to a senior SQL developer and he took one look at the query and described it as “ugly”. He fixed both of them in no time so that both queries now run just as fast or faster on 11.2.0.4 than they did on 11.1.0.7.

So, the original query ran faster when the optimizer was not working properly. A human developer simplified the query and then it ran faster when the optimizer was working properly. Maybe the moral of the story is to build simpler and cleaner SQL queries to begin with and if you find a query whose performance declines with better optimizer information then consider improving the query so that it works well with the better functioning optimizer.

Bobby

P.S. Well, I used a cardinality hint to give the 11.2.0.4 optimizer the correct number of rows for the step in the plan and it didn’t change the plan. So, the change in the way the number of rows was calculated may not be the real reason, or the only reason, for the worse plan on 11.2.0.4. Plus, I tried the alter session command mentioned in the comments and it didn’t change the plan either. I guess that something else in 11.2.0.4 caused the change to the less efficient plan but I can’t say what. Still, a developer fixed the queries in no time so that is still the main point…

Categories: DBA Blogs

_small_table_threshold=1000000 results in > 5x query speedup

Fri, 2017-04-07 16:41

Today I sped a query up by over 5 times by setting _small_table_threshold=1000000.

Here is the query elapsed time and a piece of the plan showing its behavior before setting the parameter:

Elapsed: 00:28:41.67

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                              | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  69 |              PARTITION RANGE ITERATOR                  |                        |   9125 |      1 |   9122 |00:13:02.42 |    3071K|   3050K|       |       |          |
|* 70 |               TABLE ACCESS FULL                        | SIS_INV_DTL            |   9125 |      1 |   9122 |00:13:02.25 |    3071K|   3050K|       |       |          |

I think that this part of the plan means that the query scanned a range of partitions  9125 times resulting in over three million physical reads. These reads took about 13 minutes. If you do the math it works out to between 200-300 microseconds per read. I have seen similar times from repeated reads from a storage server that has cached the data in memory. I have seen this with a SAN and with Delphix.

Here is my math for fun:

>>> 1000000*((60*13)+2.25)/3050000
256.4754098360656

About 256 microseconds per read.

I ran this query again and watched the wait events in Toad’s session browser to verify that the query was doing a bunch of direct path reads. Even though the query was doing full scans on the partition range 9000 times the database just kept on doing direct path reads for 13 minutes.

So, I got the idea of trying to increase _small_table_threshold. I was not sure if it would work with parallel queries. By the way, this is on 11.2.0.4 on HP-UX Itanium platform. So, I tried

alter session set "_small_table_threshold"=1000000;

I ran the query again and it ran in under 5 minutes. I had to add a comment to the query to get the plan to come back cleanly. So, then I reran the query again and I guess because of caching it came back in under 2 minutes:

First run:

Elapsed: 00:04:28.83

Second run:

Elapsed: 00:01:39.69

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                              | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  69 |              PARTITION RANGE ITERATOR                  |                        |   9125 |      1 |   9122 |00:00:45.33 |    3103K|      0 |       |       |          |
|* 70 |               TABLE ACCESS FULL                        | SIS_INV_DTL            |   9125 |      1 |   9122 |00:00:45.27 |    3103K|      0 |       |       |          |

The second execution did zero physical reads on these partitions instead of the 3 million that we had without the parameter!

So, it seems that if you have a query that keeps doing full scans on partitions over and over it can run a lot faster if you disable direct path read by upping _small_table_threshold.

Bobby

Categories: DBA Blogs

Jetpack 4.8 outage

Wed, 2017-04-05 17:18

I just got my bobbydurrettdba.com blog back online. It seems that an automatic update of Jetpack to version 4.8 took it down last night. I thought I would post a quick comment on how I resolved it because it is slightly different from what the Jetpack support site says.

When I tried to get into my blog today I just got a white screen saying there was a problem. I could not get into the wp-admin screen to do anything. Then I found out that Jetpack 4.8 had been pushed out and had brought down a lot of WordPress sites. Then I found out that Jetpack 4.8.1 had just come out today to fix it.

Here is the url for the 4.8.1 fix to this issue: fix

The fix points to the manual plugin install url which talks about deleting your /wp-content/plugins/jetpack folder and then manually installing by downloading the 4.8.1 jetpack zip, unzipping it, and ftping it up to /wp-content/plugins.

But all I had to do, after backing up /wp-content/plugins/jetpack was to rename it to /wp-content/plugins/jetpack.old. Then I was able to get into my site and to update the plugin to 4.8.1 through the normal web-based process.

Strangely enough the Jetpack plugin update removed my /wp-content/plugins/jetpack.old directory and replaced it with /wp-content/plugins/jetpack.

Here is what the directory looked like after I ran the update from my blogs admin page:

I never did ftp the 4.8.1 jetpack directory over though I had it unzipped and ready to go on my laptop.

Anyway, I didn’t delete my jetpack directory.  I just renamed it to jetpack.old. Then I ran the normal plugin update process.

Bobby

p.s. My site stats for today are not looking so good with the blog down all day:

Categories: DBA Blogs

Information Technology Jobs Posted

Thu, 2017-03-30 10:15

My company, US Foods, has posted a number of information technology jobs that are in Chicago (Rosemont)  or Phoenix (Tempe). Here is the web site:

https://usfood.taleo.net/careersection/usf_external/jobsearch.ftl

Enter Information Technology as the Job Field to see all the posted IT jobs.

Bobby

Categories: DBA Blogs

Using Delphix to support Oracle upgrade

Wed, 2017-03-22 17:47

I’m working on upgrading a very buggy unpatched 11.1.0.7 Oracle database to a fully patched 11.2.0.4 version. I’m using Delphix to support the upgrade and it has been a big help so far. This is on the HP-UX 11.31 Itanium platform.

The great thing about using Delphix to support an upgrade is that my very first pass through the upgrade scripts was with a full-sized clone of production. In the past I probably started with a tiny subset or even an out of the box demo database for my first upgrade pass and even when I got to QA it wasn’t a full test of a production upgrade. This time, my first test was with all the data and that was very cool.

The main example of how this helped is that we had a lot of data in the SYS.WRI$_OPTSTAT_HISTGRM_HISTORY and SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY tables in production and this made the first upgrade of its clone take a long time. After two or three attempts at other ways to speed things up, I ended up applying patch 12683802 on an 11.1.0.7 Oracle home and this allowed me to truncate these two tables.

Delphix helped me here because I had an unused 11.1.0.7 Oracle home on a different host from the one I was doing the upgrade on. I didn’t want to apply a patch on the upgrade host because there were three other databases using the home and I didn’t want to bring them down or patch them. Delphix let me move the VDB that I was upgrading over to the host that had the unused home. Then I applied the patch there and ran the truncate using the dbms_stats.purge_stats(dbms_stats.purge_all) procedure call that the patch enabled.

Then I moved the VDB back to the host where I intended to do the upgrade, which already had the fully patched 11.2.0.4 binaries installed, and ran the upgrade there. Pretty cool. I think I did the most recent upgrade in about 3.5 hours with the empty OPTSTAT tables.

By the way, doing an upgrade within Delphix is easy. You just bring the VDB up on the old oracle home, do the upgrade as you normally would include all the shutdown and startup commands, and then within the Delphix GUI you let Delphix know the VDB is now on a new Oracle home by shutting it down, choosing the new home, and bringing it back up. Piece of cake.

Bobby

Categories: DBA Blogs

Simple Python for Oracle database tuning example

Thu, 2017-02-16 16:14

I ran across a stackoverflow question and it gave me an idea for a simpler use of Python to graph some Oracle database performance information. I looked at my PythonDBAGraphs scripts and I’m not sure that it is worth modifying them to try to simplify those scripts since I like what they do. But they may make people think that Python scripts to graph Oracle performance data are difficult to write.  But, I think if someone just wants to put together some graphs using Python, Matplotlib, and cx_Oracle they could do it more simply than I have in my PythonDBAGraphs scripts and it still could be useful.

Here is an example that looks at db file sequential read waits and graphs the number of waits per interval and the average wait time in microseconds:

import cx_Oracle
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

connect_string = "MYUSER/MYPASSWORD@MYDATABASE"
con = cx_Oracle.connect(connect_string)
cur = con.cursor()

query="""
select sn.END_INTERVAL_TIME,
(after.total_waits-before.total_waits) "number of waits",
(after.time_waited_micro-before.time_waited_micro)/
(after.total_waits-before.total_waits) "ave microseconds"
from 
DBA_HIST_SYSTEM_EVENT before, 
DBA_HIST_SYSTEM_EVENT after,
DBA_HIST_SNAPSHOT sn
where before.event_name='db file sequential read' and
after.event_name=before.event_name and
after.snap_id=before.snap_id+1 and
after.instance_number=1 and
before.instance_number=after.instance_number and
after.snap_id=sn.snap_id and
after.instance_number=sn.instance_number and
(after.total_waits-before.total_waits) > 0
order by after.snap_id
"""

cur.execute(query)

datetimes = []
numwaits = []
avgmicros = []
for result in cur:
    datetimes.append(result[0])
    numwaits.append(result[1])
    avgmicros.append(result[2])
        
cur.close()
con.close()

title="db file sequential read waits"

fig = plt.figure(title)
ax = plt.axes()

plt.plot(datetimes,numwaits,'r')
plt.plot(datetimes,avgmicros,'b')

# Format X axis dates

fig.autofmt_xdate()
ax.fmt_xdata = mdates.DateFormatter('%m/%d/%Y %H:%M')
datetimefmt = mdates.DateFormatter("%m/%d/%Y")
ax.xaxis.set_major_formatter(datetimefmt)

# Title and axes labels

plt.title(title)
plt.xlabel("Date and time")
plt.ylabel("num waits and average wait time")

# Legend

plt.legend(["Number of waits","Average wait time in microseconds"],
loc='upper left')

plt.show()

The graph it produces is usable without a lot of time spent formatting it in a non-standard way:

It is a short 68 line script and you just need matplotlib and cx_Oracle to run it. I’ve tested this with Python 2.

Bobby

Categories: DBA Blogs

Rosetta Code

Sat, 2017-01-21 09:24

I have experimented with the Rosetta Code website as a way to learn more about computer technology. I have mentioned my exploration of the edX and OCW web sites in earlier posts so I thought I would mention Rosetta Code or RC as another way to learn.

The RC site displays code in a variety of programming languages so that you can compare the languages. They define certain tasks and people post code that performs the tasks. For example, the task could be to write a simple Hello World program that writes Hello World to the screen. The site shows how to do this in Java, Python, C, and others. At this moment there are 354 implementations of Hello World on the site.  I added this simple one for Oracle’s version of SQL: My Hello World

But, I can’t help wondering if this is just a fun game or is it really educational? I just finished a task this morning that was interesting to me. It is a draft task which means that people have not yet written enough example programs for the site managers to publish it as an official task. But anyone can see it. I’ve written the Python and PL/SQL examples. This 4 squares problem is just a puzzle or game but it was interesting to think about solving it in a declarative way with SQL and not a procedural way with Python or any other regular language. Maybe there is value in looking at these tasks and thinking about how SQL and relational database thinking could be applied?

Anyway, I just thought I would document what I have done with the RC site. Be careful because once you start working up example code for the site it can become addictive! I am still thinking about what the real value of the site is but it has helped me exercise my programming muscles and think a bit about SQL. Check it out if you think it has value.

Bobby

Categories: DBA Blogs

Pages