Bobby Durrett's DBA Blog
We put the workaround for Bug 13930580 in on Friday and the results have been much better than I expected. Here is when the workaround went in as reported in the alert log:
Fri Jan 17 18:38:26 2014 ALTER SYSTEM SET _use_adaptive_log_file_sync='FALSE' SCOPE=BOTH;
Here are the log file sync average times. Notice how they go down after 7 pm Friday:
END_INTERVAL_TIME number of waits ave microseconds -------------------------- --------------- ---------------- 17-JAN-14 12.00.49.669 AM 78666 15432.6923 17-JAN-14 01.00.27.862 AM 13380 15509.9778 17-JAN-14 02.00.11.834 AM 15838 17254.2949 17-JAN-14 03.00.56.429 AM 10681 29832.4282 17-JAN-14 04.00.39.502 AM 26127 14880.2097 17-JAN-14 05.00.22.716 AM 21637 10952.5322 17-JAN-14 06.00.01.558 AM 67162 9756.44207 17-JAN-14 07.00.45.358 AM 123705 11755.6535 17-JAN-14 08.00.29.811 AM 223799 11341.2467 17-JAN-14 09.00.19.275 AM 319051 13651.4647 17-JAN-14 10.00.09.089 AM 507335 13991.5543 17-JAN-14 11.00.59.502 AM 583835 11609.8432 17-JAN-14 12.00.44.044 PM 627506 10857.4556 17-JAN-14 01.00.30.133 PM 610232 11233.9348 17-JAN-14 02.00.18.961 PM 664368 10880.3887 17-JAN-14 03.00.05.694 PM 647896 9865.96367 17-JAN-14 04.00.44.694 PM 538270 10425.6479 17-JAN-14 05.00.24.376 PM 343863 9873.98468 17-JAN-14 06.00.11.481 PM 169654 9735.80996 17-JAN-14 07.00.03.087 PM 87590 7046.92633 17-JAN-14 08.00.52.390 PM 69297 2904.62955 17-JAN-14 09.00.29.888 PM 38244 3017.15969 17-JAN-14 10.00.09.436 PM 28166 3876.77469 17-JAN-14 11.00.54.765 PM 23220 11109.3063 18-JAN-14 12.00.33.790 AM 13293 9749.99428 18-JAN-14 01.00.17.853 AM 15332 3797.76839 18-JAN-14 02.00.56.050 AM 16137 6167.15127 18-JAN-14 03.00.33.908 AM 14621 9664.63108 18-JAN-14 04.00.12.383 AM 9708 6024.9829 18-JAN-14 05.00.56.348 AM 14565 3618.76938 18-JAN-14 06.00.39.683 AM 14323 3517.45402 18-JAN-14 07.00.29.535 AM 38243 3753.46422 18-JAN-14 08.00.16.778 AM 44878 2280.22924 18-JAN-14 09.00.01.176 AM 73082 9689.52484 18-JAN-14 10.00.45.168 AM 99302 2094.03293 18-JAN-14 11.00.35.070 AM 148789 1898.40424 18-JAN-14 12.00.23.344 PM 151780 1932.64997 18-JAN-14 01.00.08.631 PM 186040 2183.18563 18-JAN-14 02.00.59.839 PM 199826 2328.87331 18-JAN-14 03.00.45.441 PM 210098 1335.9759 18-JAN-14 04.00.36.453 PM 177331 1448.39219 18-JAN-14 05.00.21.669 PM 150837 1375.07256 18-JAN-14 06.00.59.959 PM 122234 1228.21767 18-JAN-14 07.00.37.851 PM 116396 1334.64569 ... skip a few to find some higher load times... 19-JAN-14 10.00.01.434 AM 557020 2131.02737 19-JAN-14 11.00.42.786 AM 700781 1621.16596 19-JAN-14 12.00.31.934 PM 715327 1671.72335 19-JAN-14 01.00.10.699 PM 718417 1553.98083 19-JAN-14 02.00.51.524 PM 730149 2466.6241 19-JAN-14 03.00.38.088 PM 628319 2465.45829
When the system is busy we are seeing less than 3000 microseconds = 3 milliseconds log file sync which is good. We were seeing 10 milliseconds or more which isn’t that great.
Oracle support has been pushing this for a long time but our own testing wasn’t able to recreate the problem. Have to hand it to them. They were right!
Here is a link to my previous post on this issue: url
Back in July I finally got Oracle 12c installed on my laptop as documented in this post: url
But, that was as far as I got. The last thing I did was get an error message creating a user. Well, I figured out how to create a new user and a few other things. I’m working with the ORCL database that comes with the install and all the parameters, etc. that come with it.
Evidently the default install comes with a PDB called PDBORCL. So, I have two tns entries one for the parent CBD and one for the child PDB and they look like this:
ORCL.WORLD = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.133.128) (PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcl.mydomain.com) ) ) PDB.WORLD = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.133.128) (PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = pdborcl.mydomain.com) ) )
I guess the service name has the name of the PDB in it.
So, if I connect as SYSTEM/password@orcl I’m connected to the CDB and if I connect to SYSTEM/password@pdb I’m connected to the PDB. When I connected to the PDB I could create a new user without getting an error.
But, when I first tried connecting to the PDB I got this error, even though the database was up:
ORA-01033: ORACLE initialization or shutdown in progress
So, to bring the database up (by the way, I’m on 64 bit Linux) after booting the Linux VM the following steps were required:
sqlplus / as sysdba
alter session set container=PDBORCL;
Probably this could all be scripted but that’s what I did today.
Interestingly there is only one pmon:
$ ps -ef | grep pmon oracle 29495 1 0 06:52 ? 00:00:00 ora_pmon_orcl
But you get different results when you query dba_data_files depending on whether connected to the CDB or PDB:
FILE_NAME ------------------------------------------ /u01/app/oracle/oradata/orcl/system01.dbf /u01/app/oracle/oradata/orcl/sysaux01.dbf /u01/app/oracle/oradata/orcl/undotbs01.dbf /u01/app/oracle/oradata/orcl/users01.dbf
FILE_NAME -------------------------------------------------------------- /u01/app/oracle/oradata/orcl/pdborcl/system01.dbf /u01/app/oracle/oradata/orcl/pdborcl/sysaux01.dbf /u01/app/oracle/oradata/orcl/pdborcl/SAMPLE_SCHEMA_users01.dbf /u01/app/oracle/oradata/orcl/pdborcl/example01.dbf
So, I guess each PDB has its own SYSTEM and SYSAUX tablespaces?
Lastly when running my scripts to poke around I edited my sqlplus header script to report which container you are in. It looks like this now:
set linesize 32000 set pagesize 1000 set long 2000000000 set longchunksize 1000 set head off; set verify off; set termout off; column u new_value us noprint; column n new_value ns noprint; column c new_value cs noprint; select name n from v$database; select user u from dual; SELECT SYS_CONTEXT('USERENV', 'CON_NAME') c FROM DUAL; set sqlprompt &ns:&cs:&us> set head on set echo on set termout on set trimspool on spool &ns..&cs..logfilename.log
Replace “logfilename” with whatever you want for your script name.
It puts out a prompt like this:
And the log file names:
Anyway, this is just a quick post about my first attempts to get around in 12c.
I’ve been reading up on Oracle 12c to get certified and to help advise my company on potential uses for the new version. I’ve been looking forward to researching the new Adaptive Optimization features because it makes so much sense that the database should change its plans when it finds differences between the expected number of rows each part of a plan sees and the actual number of rows.
I’ve written blog posts in the past about limitations of the optimizer related to its ability to determine the number of rows (cardinality) that steps in a plan would see. I took the scripts from some of these and ran them on a 12c instance to see if the new features would cause any of the inefficient plans to change to the obvious efficient plans.
Sadly, none of my examples ran differently on 12c. I don’t doubt that there are examples that run better because of the new features but the ones I constructed earlier didn’t see any improvement. So, I thought I would blog about one such example.
Here is the original blog post with an example run on 11.2 Oracle: url
Here is the same script run on 12c: zip
Here is the query with the bad plan:
select B.DIVNUM from DIVISION A,SALES B where a.DIVNUM=B.DIVNUM and A.DIVNAME='Mayberry' Plan hash value: 480645376 -------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 421 (100)| |* 1 | HASH JOIN | | 500K| 8300K| 421 (2)| |* 2 | TABLE ACCESS FULL| DIVISION | 1 | 14 | 2 (0)| | 3 | TABLE ACCESS FULL| SALES | 1000K| 2929K| 417 (1)| --------------------------------------------------------------------
There is only 1 SALES row that has the DIVNUM associated with DIVNAME=’Mayberry’. There 1,000,001 SALES rows and there is an index on SALES.DIVNUM so an index scan would be the most efficient access method and a nested loops join the most efficient join method. But the 12c optimizer chooses a hash join and full table scan instead.
According to the 12c SQL Tuning manual there are two types of Adaptive Optimization that might help in this case: Adaptive Plans and Adaptive Statistics. I tried to tweak my test script to get Adaptive Statistics to kick in by commenting out the dbms_stats calls but it didn’t help. Also, I tried running the query several times in a row but it never changed plan.
I can see why Adaptive Plans wouldn’t work. How long will it let the full scan of SALES go before it decides to switch to a nested loops join with an index scan? If it gets half way through the table it is too late. So, I’m not sure how Adaptive Plans could change the plan on the fly when it expects a lot of rows and only finds a few.
On the Adaptive Statistics I guess this is just a case that it still can’t handle. I guess it is like a histogram across joins case that would be pretty complex to solve in general.
Anyway, this all reminds me of when I first learned about histograms. I got all excited that histograms would solve all my query performance problems and then came crashing down to earth when I realized it wasn’t the case. I think the analogy fits. Histograms improved cardinality estimates and can help in certain cases. I think the new adaptive features will help improve plans by using real cardinality figures where it can, but they aren’t a cure-all.
I’m not sure that getting cardinality right is a solvable problem in the general case. You have to have a fast optimizer so there are limits to how much it can do.
I ran all this as the user SYSTEM on the base 12c 64 bit Linux install with all the default parameters unchanged on the standard default database.
I just finished reading the main section of the Oracle 12c New Features Guide. I read pages 17-107 of the version I have – Section 1 titled “Oracle Database 12c Release 1 (12.1) New Features”. I underlined and asterisked the things that stuck out as interesting in this pass and thought I would include them in this post:
126.96.36.199 New PL/SQL DBMS_UTILITY.EXPAND_SQL_TEXT Procedure Expands views into SQL automatically. 188.8.131.52 Default Values for Columns Based on Oracle Sequences Use sequence in column definition for default values. 184.108.40.206 Increased Size Limit for VARCHAR2, NVARCHAR2, and RAW Data Types 32K varchar2 columns 220.127.116.11 JDBC Support for Database Resident Connection Pool Possible alternative to shared servers 18.104.22.168 Asynchronous Global Index Maintenance for DROP and TRUNCATE Partition Global index not made unusable by partition maintenance 22.214.171.124 Adaptive Query Optimization Plans change when DB sees that its cardinality estimates were wrong. 126.96.36.199 Session-Private Statistics for Global Temporary Tables Gather stats on global temp tables for your session only - cool. 188.8.131.52 SecureFiles is the Default for LOB Storage Not sure what the means, but good to know that the default changed. 1.4.1 Database Consolidation Subsections 2-6 give some clues to the way the multitenant architecture works. 184.108.40.206 Cloning a Database Sounds similar to Delphix 220.127.116.11 Oracle Data Pump Change Table Compression at Import Time Imported data can be compressed using HCC on target. 18.104.22.168 Multiple Indexes on Same Set of Columns Can have different kinds of indexes on same set of columns (same order I assume) 22.214.171.124 Active Database Duplication Enhancements Faster clone of an open database using RMAN 126.96.36.199 Enterprise Manager Database Express Sounds like a better EM tool - would like to check it out and review the 2-Day dba manuals which show uses of it. 188.8.131.52 Queryable Patch Inventory Don't have to do lsniventory to see your patches? 184.108.40.206 Multi-Process Multi-Threaded Oracle Wondering what platforms this is on and what it really means. 220.127.116.11 Unified Context-Based Database Audit Architecture Sounds like this may replace some third party tools. Improved auditing. 18.104.22.168 Parallel Upgrade May speed up upgrade by parallelising
I read all 91 pages but there were sections that didn’t really interest me since they are about features we don’t use such as Spatial. If you are interested in 12c I encourage you to read this as I did. I printed out about 10 pages at a time. It’s only 91 pages so it doesn’t take forever to read it.
It looks like we hit 13930580 on one of our systems and I wanted to put some example output showing log file sync times going above 10 milliseconds (10,000 microseconds) after the log writer switches to polling.
Here is the output in the lgwr trace file showing the switch to polling:
*** 2013-12-27 01:50:47.374 kcrfw_update_adaptive_sync_mode: post->poll long#=1 sync#=1 sync=55336 poll=1913 4 rw=5506 rw+=5506 ack=0 min_sleep=19134 *** 2013-12-27 01:50:47.374 Log file sync switching to polling Current scheduling delay is 1 usec Current approximate redo synch write rate is 0 per sec
Here is a report showing the number of log file sync waits and their average time in microseconds:
END_INTERVAL_TIME number of waits ave microseconds -------------------------- --------------- ---------------- 26-DEC-13 03.00.19.333 PM 976539 2279.06861 26-DEC-13 04.00.43.475 PM 681212 2029.32406 26-DEC-13 05.00.07.975 PM 343932 1575.26284 26-DEC-13 06.00.34.772 PM 163911 1850.74354 26-DEC-13 07.01.00.911 PM 73151 3815.28597 26-DEC-13 08.00.24.463 PM 39304 5038.05427 26-DEC-13 09.00.48.970 PM 32122 5677.00557 26-DEC-13 10.00.13.491 PM 472349 2353.95857 26-DEC-13 11.00.40.521 PM 18679 18655.5294 27-DEC-13 12.00.06.276 AM 19618 17046.2287 27-DEC-13 01.00.30.299 AM 18983 5721.99178 27-DEC-13 02.00.54.261 AM 17724 17106.3415 27-DEC-13 03.00.18.282 AM 9088 25342.7271 27-DEC-13 04.00.42.218 AM 14365 12128.7 27-DEC-13 05.00.06.391 AM 16323 12879.8831 27-DEC-13 06.00.31.379 AM 43758 15326.7298 27-DEC-13 07.00.56.027 AM 83819 14796.2851 27-DEC-13 08.00.20.637 AM 168718 13506.4363 27-DEC-13 09.00.47.262 AM 302827 19116.9491 27-DEC-13 10.00.14.014 AM 480347 19358.6655 27-DEC-13 11.00.41.178 AM 512777 15952.2358 27-DEC-13 12.00.08.220 PM 511091 13799.5512 27-DEC-13 01.00.38.131 PM 576341 10183.4347 27-DEC-13 02.00.06.308 PM 524568 10251.1259
Notice how the average wait time goes above 10,000 microseconds consistently once the log writer switches to polling between 1 and 2 am on 12/27/2013. I didn’t show all the output but this long log file sync wait time has continued since the switch.
Also, these long log file sync times don’t correspond to long log file parallel write times. Here are the number and averages of log file parallel write waits for the same time frame:
END_INTERVAL_TIME number of waits ave microseconds -------------------------- --------------- ---------------- 26-DEC-13 03.00.19.333 PM 902849 1426.66601 26-DEC-13 04.00.43.475 PM 659701 1394.87763 26-DEC-13 05.00.07.975 PM 344245 1294.92401 26-DEC-13 06.00.34.772 PM 166643 1586.64944 26-DEC-13 07.01.00.911 PM 80457 4019.29429 26-DEC-13 08.00.24.463 PM 46409 5580.67827 26-DEC-13 09.00.48.970 PM 69218 5115.20904 26-DEC-13 10.00.13.491 PM 475297 2219.80541 26-DEC-13 11.00.40.521 PM 40943 19405.052 27-DEC-13 12.00.06.276 AM 38835 18160.8073 27-DEC-13 01.00.30.299 AM 24734 6321.38425 27-DEC-13 02.00.54.261 AM 33617 11723.6698 27-DEC-13 03.00.18.282 AM 36469 17485.2614 27-DEC-13 04.00.42.218 AM 19344 6955.27042 27-DEC-13 05.00.06.391 AM 17857 4399.75718 27-DEC-13 06.00.31.379 AM 45098 4923.02763 27-DEC-13 07.00.56.027 AM 83700 3610.39713 27-DEC-13 08.00.20.637 AM 160919 2841.31507 27-DEC-13 09.00.47.262 AM 266405 3523.86855 27-DEC-13 10.00.14.014 AM 384795 3367.5075 27-DEC-13 11.00.41.178 AM 437806 2729.84248 27-DEC-13 12.00.08.220 PM 448261 2442.81012 27-DEC-13 01.00.38.131 PM 511648 1880.74418 27-DEC-13 02.00.06.308 PM 481106 1919.21158
The average I/O time – log file parallel write – is pretty low when the system is active (more than 100,000 waits per hour) – usually less than 4000 microseconds and yet log file sync is always more than 10,000 after the switch to polling. Also, the CPU on the system is consistently less than 30% used so it isn’t a system load issue.
Here are some Oracle support documents related to this issue:
Adaptive Switching Between Log Write Methods can Cause 'log file sync' Waits (Doc ID 1462942.1) Waits for "log file sync" with Adaptive Polling vs Post/Wait Choice Enabled (Doc ID 1541136.1) Bug 13930580: LGWR IS BLOCKING SESSIONS
Here is the script I used to get the wait output:
set linesize 32000 set pagesize 1000 set long 2000000000 set longchunksize 1000 set head off; set verify off; set termout off; column u new_value us noprint; column n new_value ns noprint; select name n from v$database; select user u from dual; set sqlprompt &ns:&us> set head on set echo on set termout on set trimspool on UNDEFINE WAITNAME UNDEFINE MINIMUMWAITS spool "&ns.&&WAITNAME..log" column END_INTERVAL_TIME format a26 select sn.END_INTERVAL_TIME, (after.total_waits-before.total_waits) "number of waits", (after.time_waited_micro-before.time_waited_micro)/ (after.total_waits-before.total_waits) "ave microseconds", before.event_name "wait name" from DBA_HIST_SYSTEM_EVENT before, DBA_HIST_SYSTEM_EVENT after, DBA_HIST_SNAPSHOT sn where before.event_name='&&WAITNAME' and after.event_name=before.event_name and after.snap_id=before.snap_id+1 and after.instance_number=1 and before.instance_number=after.instance_number and after.snap_id=sn.snap_id and after.instance_number=sn.instance_number and (after.total_waits-before.total_waits) > &&MINIMUMWAITS order by after.snap_id; spool off
I gave log file sync as the WAITNAME and 1 as MINIMUMWAITS for the first output. I changed WAITNAME to log file parallel write for the second one with 1 still for MINIMUMWAITS.
It looks like there is a new feature in 11.2 that was finally turned on by default in 22.214.171.124. The work around is to set a hidden parameter to false to turn off the new feature. Check out the Oracle support docs I listed for details.
p.s. I forgot to mention that when I tested on a test database with and without this new feature the log file sync times were lower with the new feature, as they should be.
With _use_adaptive_log_file_sync=TRUE (feature enabled):
Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Class log file sync 639,598 3,466 5 86.74 Commit DB CPU 397 9.93 buffer exterminate 683 14 21 0.36 Other virtual circuit wait 912 12 13 0.31 Network SQL*Net message to client 1,293,196 7 0 0.17 Network
With _use_adaptive_log_file_sync=FALSE (disabled as in earlier versions of 11.2):
Top 5 Timed Foreground EventsEvent Waits Time(s) Avg wait (ms) % DB time Wait Class log file sync 639,644 3,553 6 87.31 Commit DB CPU 367 9.02 buffer exterminate 1,364 28 21 0.69 Other buffer busy waits 2,679 15 6 0.37 Concurrency virtual circuit wait 903 13 15 0.32 Network
With the new feature enabled log file sync was 5 milliseconds instead of 6 without it. So, the new feature does speed up log file sync waits when it is working normally. But, there must be some bug condition where it degrades to greater than 10 millisecond log file syncs.
Just got off of a Webex with Delphix support. They upgraded our Delphix server from version 126.96.36.199 to 188.8.131.52. It took about twenty minutes. Pretty nice compared to an Oracle database upgrade I think!
The only thing that took some time was that I had to be sure I had enough space. It looks like Delphix needs your disk space to be no more than 85% utilized to fully function. We ended up extending our four 1.5 terabyte luns to 1.9 TB each to give us space. Then I cleaned up some archive log space by running a snapsync on each source database to completion. Our system just needed a little TLC to get some space free again.
But, the upgrade itself, running the scripts and rebooting the server, took 20 minutes and was all done by Delphix support for no extra charge. Sweet.
I just think this is a cool picture of an Oracle 12c instance and database from Oracle’s 12c Concepts manual (not my own work):
This is from the concepts manual found here: url
Just finished reading the book by Jonathan Lewis titled “Oracle Core: Essential Internals for DBAs and Developers“. I think I picked it up at the Collaborate 13 conference in Denver last April but haven’t had time (or taken the time) to read it.
Reading a book like Oracle Core can be a challenge because it is pretty dense with example scripts and outputs including dumps in hex. So, I decided to take the strategy of pushing myself to crash through the book without carefully following every example. I may only have absorbed about 10% of the material but if I didn’t jam through it I would have gotten 0%!
I picked up Oracle Core because I had read another book by the same author titled “Cost-Based Oracle Fundamentals” which has paid for itself 100 times over in terms of helping me tune queries. I highly recommend Cost-Based Oracle Fundamentals without reservation. But, like Oracle Core it can be a challenge to just sit down and read it and follow every SQL example and output. Probably it would be worth making a first pass focusing on just the English language text and skimming the examples, maybe delving into the examples of most interest.
In the case of Oracle Core I haven’t yet put it to practical use but I’m glad to have at least skimmed through it. Now I know what’s in it and can refer back to it when needed.
Next I hope to start reading up on Oracle 12c since I plan to get my certification this year. But, I wanted to finish Oracle Core before I moved on, even if I only read it at a high level.
Just for fun I’ve pasted in a table listing the top 10 most viewed posts on this blog as links and including total number views since this blog began in March 2012. I based this on WordPress’s statistics so I’m not sure exactly how the blog software collects the numbers but it is fun to get some positive feedback. Hopefully it means people are getting something out of it. I’m certainly enjoying putting it together. Here are the links ordered by views as listed on the right:cell single block physical read 3,738 REGEXP_LIKE Example 2,822 Finding query with high temp space usage using ASH views 2,232 DBA_HIST_ACTIVE_SESS_HISTORY 2,097 CPU queuing and library cache: mutex X waits 1,801 DBMS_SPACE.SPACE_USAGE 1,748 Resource Manager wait events 1,566 Fast way to copy data into a table 1,166 Delphix First Month 1,074 use_nl and use_hash hints for inner tables of joins 1,047
Anyway, I thought I would list the top ten posts on this blog if you want to read the ones that have the most views and possibly are the most useful.