Bobby Durrett's DBA Blog
How to figure out disk network latency using cached blocks
I was working on some disk performance problems and came across a simple way to test the latency or delay built into the networks that connect my database servers to their disk storage. All I have to do is read some disk blocks from a table several times to be sure they are cached in the disk storage server and then flush the database buffer cache and read from the table once more. Since the blocks are not cached in the database buffer cache but are cached on the disk array the time it takes to read the blocks approaches the time it takes to copy the blocks over the network. Of course there is some CPU on both the source and target to copy the blocks in memory but the physical disk read is eliminated and you see close to the minimum time it is possible for a transfer to take.
So, here is my simple test script. It assumes the user, password, and tnsnames.ora name are passed as parameters and that the user is a DBA user like SYSTEM.
connect &1/&2@&3
-- create test that will show how fast reads
-- can be when cached on the disk subsystem
-- document version
select * from v$version;
-- create small table
drop table test;
create table test as select * from dba_tables where rownum < 1000;
-- query three times to get cached in the disk system
select sum(blocks) from test;
select sum(blocks) from test;
select sum(blocks) from test;
-- flush from database cache
alter system flush buffer_cache;
-- reconnect so session waits are cleared
disconnect
connect &1/&2@&3
select sum(blocks) from test;
-- show average scattered read
-- should be best time you can
-- get since blocks are cached
-- in the disk subsystem
VARIABLE monitored_sid number;
begin
SELECT sid into :monitored_sid from v$session
where audsid=USERENV('SESSIONID');
end;
/
select
total_waits,
time_waited_micro/total_waits
FROM V$SESSION_EVENT a
WHERE a.SID= :monitored_sid
and event='db file scattered read';
Here is what the output looks like on a slow system with network problems (over NFS in my case):
TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS ----------- ----------------------------- 4 1410.25
Here is what it looks like on a nice fiber network:
TOTAL_WAITS TIME_WAITED_MICRO/TOTAL_WAITS ----------- ----------------------------- 4 40.5
40 microseconds for a disk read is sweet compared to 1410!
I’ve uploaded the script and three runs I made with it here.
- Bobby
Subscribing to Oak Table blogs feed
I’ve seen some very good information posted in this feed which combines blog postings from many different Oracle performance experts who are part of what is called the “Oak Table”
http://www.oaktable.net/feed/blog-rss.xml
I’ve been using Internet Explorer to keep track of new posts in its “Feeds” section of the Favorites. Here is how to add the Oak Table blog feed to Internet Explorer:
Go to the URL listed above and click on “Subscribe to this feed”
Click on Subscribe button
Success! Now click on Favorites and then Feeds
For any feed in your list if you see the feed name in a darker font it means there is a new post. So, as I have time, I’ll go to my feeds and see which of the ones I’ve subscribed to have new posts. If you are looking for performance tuning information I highly recommend the Oak Table feed.
- Bobby
inode lock contention
In my forum discussion about free buffer waits I came across a term that I didn’t understand: “inode lock contention”. I’m pretty sure I had seen this same term years ago on one of Steve Adams’ pages on IO. But, I didn’t really understand what the term meant and so it was hard to understand whether this was something I was seeing on our production system that was experiencing “free buffer waits”.
First I had to figure out what an inode was. I knew that it had something to do with the way Unix filesystems work but reading this article really helped clear up what inodes are at least on HP-UX. Inodes are small chunks of bytes that are used to define a Unix filesystem. On HP-UX’s VxFS filesystems a type 1 inode can point to up to 10 extents of one or more contiguous 8K blocks on a large filesystem. The filesystem I’ve been testing on appears to have 32 meg extents if I’m reading this output from lvdisplay correctly:
LV Size (Mbytes) 1472000 Current LE 46000
Total size of 1,472,000 meg divided by 46,000 logical extents = 32 meg per extent.
Since the inode can point to 1 to 10 extents it could point to between 32 and 320 meg.
My test case had 15 tables that were more than 1 gigabytes each. It seems like each table should span multiple inodes so even if there is locking at the inode level it looks like it won’t lock the entire table at once. Still, it seems unlikely to me that every time a table is updated that reads from all the other parts of the table pointed to by the same inode are really blocked by an inode lock. Yet that is what this document suggests:
“During a read() system call, VxFS will acquire the inode lock in shared mode, allowing many processes to read a single file concurrently without lock contention. However, when a write() system call is made, VxFS will attempt to acquire the lock in exclusive mode. The exclusive lock allows only one write per file to be in progress at a time, and also blocks other processes reading the file. These locks on the VxFS inode can cause serious performance problems when there are one or more file writers and multiple file readers.”
It uses the term “file” but I assume if you have a large file that has multiple inodes it means it will lock just the pieces associated with the one inode that points to the blocks that are being written. The article goes on to explain how you can use the “cio” option to enable concurrent IO and eliminate this inode contention preventing writers from blocking readers. But, I’ve been testing with just the direct IO options and not the cio option and seeing great results. So, would I see even better improvement with concurrent io?
I didn’t want to mess with our current filesystem mount options since testing had proven them to be so effective but I found that in glance, a performance monitoring tool like top, you have an option to display inode waits. So, I took a test that was running with direct IO and had 15 merge statements loading data into the same empty table at once and ran glance to see if there were any inode waits. There were not:
So, I don’t know if I can depend on this statistic in glance or not. It appears that the direct IO mount options are all we need:
mincache=direct,convosync=direct
filesystemio_options=DIRECTIO
There may be some case within Oracle 11.2.03 on HP-UX 11.31 where you can be hampered by inode lock contention despite having direct IO enabled but my tests have not confirmed it and I’ve banged pretty hard on my test system with a couple of different types of tests.
- Bobby
Interesting post on clustering factor
I just read this post about a new patch that will allow you to affect the way the optimizer calculates clustering factor and hence how likely it is to choose a particular index for a query plan.
Pretty cool. I haven’t tried it, but it looks promising.
- Bobby
Oracle internals web site
This is a good web site with Oracle internals information:
http://www.ixora.com.au/notes/
I’ve benefited from this one for years.
- Bobby
Direct IO and db_writer_processes=36 on HP-UX reduced free buffer waits
In my previous post I explained that setting db_writer_processes=1, dbwr_io_slaves=32 made a 2-3 times reduction in run time of my test of 15 concurrent updates.
Further testing has shown that an even greater improvement – really 10 times – can be made by switching to direct IO and maxing out the db writer processes.
To switch my test database to direct IO I had to do two things:
- Ask one of our Unix administrators to remount the filesystem that contains the datafile being tested using these options: mincache=direct,convosync=direct
- Change this parameter: filesystemio_options=directIO
Then I switched to what the documentation says is the maximum number of db writers:
- db_writer_processes=36
- dbwr_io_slaves=0
I had setup a test that generated free buffer waits by changing my update statements to update more blocks than could be held in the buffer cache and I had set log_checkpoint_interval back to its default of 0 so we wouldn’t get frequent checkpoints. I also increased the redo logs to 2 gig so they wouldn’t switch and checkpoint frequently. So, my test was getting plenty of free buffer waits and it took roughly 30 minutes for my 15 concurrent update statements to update 1 million rows each. This was with my current production settings of db_writer_processes=4 and dbwr_io_slaves=0.
Here is a profile of the time spent by one update statement with no direct io and db_writer_processes=4 and dbwr_io_slaves=0:
TIMESOURCE SECONDS PERCENTAGE -------------------------- ---------- ---------- TOTAL_TIME 2503 100 free buffer waits 2097 84 db file scattered read 253 10 CPU 61 2 UNACCOUNTED_TIME 57 2 db file sequential read 26 1 latch: redo copy 5 0 events in waitclass Other 2 0 log buffer space 1 0
Here is a profile with the direct io options and db_writer_processes=36, dbwr_io_slaves=0:
TIMESOURCE SECONDS PERCENTAGE -------------------------- ---------- ---------- TOTAL_TIME 171 100 free buffer waits 51 30 UNACCOUNTED_TIME 41 24 db file scattered read 34 20 CPU 23 13 log buffer space 16 9 events in waitclass Other 3 2 latch: redo copy 2 1
Incredible. Thanks to Jonathan Lewis and Mark Powell for all of their patient discussion of this issue with me on our forum thread.
So, I guess the bottom line is that if you can’t get your filesystems mounted with direct IO options then the IO slaves may be the way to go in certain scenarios. But, with direct IO it appears that upping the number of db writers is better than using IO slaves, at least in a scenario like mine were you have many concurrent updates filling the buffer cache with updated blocks and waiting on free buffer waits.
- Bobby
db_writer_processes, dbwr_io_slaves with no asynch I/O on HP-UX
I’m working on an HP-UX system that doesn’t have asynchronous I/O configured and I’m getting a bunch of “free buffer waits” which indicates that the DBWR processes are having trouble writing updated blocks from memory to disk fast enough. Some preliminary testing I’ve done implies that I should change the following init.ora parameters: db_writer_processes, dbwr_io_slaves. Our current settings in production are:
db_writer_processes=4, dbwr_io_slaves=0
In my test database these settings were 2-3 times faster when running 15 parallel update statements (each updating 100,000 rows):
db_writer_processes=1, dbwr_io_slaves=32
I also tried bumping the writer processes way up to these settings:
db_writer_processes=36, dbwr_io_slaves=0
but this didn’t help at all.
I engaged in a forum discussion with some very helpful people here: forum thread
Prior to the forum discussion I reviewed the manuals, Oracle’s support site, some blog postings and a usergroup presentation and was left with contradictory and confusing information on what settings to try for these two parameters. I got the impression that increasing db_writer_processes would help but in my test it did not.
I can’t come to any firm conclusions except to recommend that if you don’t have asychronous I/O on HP-UX try setting db_writer_processes=1, dbwr_io_slaves=32 (some number > 0) and see if it helps. Of course always try any change in a test environment before making the change in production.
- Bobby
P.S. This query should show you if you datafiles are using asynch i/o:
select ASYNCH_IO,count(*) from v$iostat_file group by ASYNCH_IO;
Here is the output on our production server with the free buffer waits:
SQL> select ASYNCH_IO,count(*) from v$iostat_file group by ASYNCH_IO; ASYNCH_IO COUNT(*) --------- ---------- ASYNC_OFF 301
On a system with asynch I/O it looks like this:
SQL> select ASYNCH_IO,count(*) from v$iostat_file group by ASYNCH_IO; ASYNCH_IO COUNT(*) --------- ---------- ASYNC_OFF 10 ASYNC_ON 27
The 10 files with asynch off are not data files.
p.s. Interestingly an 11.2 manual says that HP-UX doesn’t support asynch IO on filesystems. I haven’t verified this with a test but it looks like your best bet on HP-UX would be to use raw devices and ASM with asynch IO.
Oracle® Database Administrator’s Reference
11g Release 2 (11.2) for Linux and UNIX-Based Operating Systems
“To use asynchronous Input-Output on HP-UX, you must use an Automatic Storage Management disk group that uses raw partitions as the storage option for database files.”
In my case I’m just looking for a quick boost to an existing system with datafiles on filesystems on HP-UX so maybe in this kind of special case setting db_writer_processes=1, dbwr_io_slaves > 0 makes sense. At least it is worth a try.
DBA_HIST_SEG_STAT example
I used this query to debug some updating issues on a table:
select ss.DATAOBJ#, sn.END_INTERVAL_TIME, ss.DB_BLOCK_CHANGES_DELTA, ss.PHYSICAL_WRITES_DELTA from DBA_HIST_SEG_STAT ss, DBA_HIST_SNAPSHOT sn, DBA_HIST_SEG_STAT_OBJ so where so.OWNER='MYUSER' and so.OBJECT_NAME='PLAN_TABLE' and so.OBJECT_TYPE='TABLE' and ss.OBJ#=so.OBJ# and ss.DATAOBJ#=so.DATAOBJ# and ss.snap_id=sn.snap_id order by ss.snap_id,ss.DATAOBJ#;
This is just an example of a PLAN_TABLE in my schema. I changed the schema name for security. Here is edited output:
DATAOBJ# END_INTERVAL_TIME DB_BLOCK_CHANGES_DELTA PHYSICAL_WRITES_DELTA ---------- ------------------------- ---------------------- --------------------- 2956377 17-APR-13 11.00.37.126 AM 32 11 2956387 17-APR-13 11.00.37.126 AM 16 0 2956389 17-APR-13 12.00.55.941 PM 16 8 2956397 17-APR-13 12.00.55.941 PM 3296 35 2956395 17-APR-13 02.00.31.239 PM 0 0
This could be useful to see when a given table was modified.
- Bobby
P.S. I think the DATAOBJ# changes when I truncate the table. I usually truncate my PLAN_TABLE before doing a plan to make sure I don’t get an old plan by accident.
DBA_HIST_ACTIVE_SESS_HISTORY shows PeopleSoft OPRIDs
You can use DBA_HIST_ACTIVE_SESS_HISTORY to show details about the PeopleSoft operator id (OPRID) and which part of the online application they were in. I was just looking at a Query Manager issue and found that DBA_HIST_ACTIVE_SESS_HISTORY populated these three columns with PeopleSoft specific information like this:
MODULE=QUERY_MANAGER
ACTION=QRY_SELECT
CLIENT_ID=OPRID of user running query manager
Pretty cool. This is on HRMS 9.1 PeopleTools 8.50 Oracle database 11.2.0.3.
- Bobby
Don’t use TIME_WAITED in ASH views
I attended John Beresniewicz’s Active Session History (ASH) talk at Collaborate 13 on Monday. One simple point from it was that he commonly sees queries of the ASH views that use the TIME_WAITED column incorrectly and result in incorrect results. The ASH views are V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY.
I’ve used DBA_HIST_ACTIVE_SESS_HISTORY frequently to diagnose Oracle performance problems and I don’t believe that I use TIME_WAITED. All I do is count up the number of rows in DBA_HIST_ACTIVE_SESS_HISTORY and count each row as 10 seconds of time. This would be either time on the CPU or wait time. Here is a query I recently used on a real performance problem:
select
case SESSION_STATE
when 'WAITING' then event
else SESSION_STATE
end TIME_CATEGORY,
(count(*)*10) seconds
from DBA_HIST_ACTIVE_SESS_HISTORY a,
V$INSTANCE i,
dba_users u
where
a.user_id = u.user_id and
a.instance_number = i.instance_number and
a.user_id = u.user_id and
sample_time
between
to_date('2013-04-02 00:00','YYYY-MM-DD HH24:MI')
and
to_date('2013-04-02 22:00','YYYY-MM-DD HH24:MI')
and
a.sql_id = 'c12m4zxj3abm6'
group by SESSION_STATE,EVENT
order by seconds desc;
Disregard the joins to dba_users and v$instance. These are left over from previous uses of this query. Here is the output:
TIME_CATEGORY SECONDS ------------------------------ ---------- free buffer waits 49110 db file parallel read 11310 write complete waits 1810 db file sequential read 1600 ON CPU 720 read by other session 220 PX qref latch 50 db file scattered read 20 direct path read temp 20 latch: redo allocation 10 latch: cache buffers lru chain 10
An AWR report from 00:00 to 21:00 the same day showed this sql_id with one execution of length 50,540.80 seconds that didn’t finish. So, this roughly corresponds to the numbers in the ASH profile query above. About 49,000 of the 60,000 seconds of run time are accounted for by free buffer waits. We believe this is caused by too frequent checkpointing but haven’t verified it yet with a fix in production.
Anyway, this type of query was useful to me and it didn’t use the TIME_WAITED column so it validates to some extent the notion that you don’t need to use TIME_WAITED on the ASH views to have a useful query.
- Bobby
Lessons from preparing my Exadata talk
Well, I’m giving this talk related to Exadata at the Collaborate 13 usergroup conference on Monday. I’ve spent a lot of time – probably too much time – preparing the slides and practicing the talk. I first gave this talk a year ago at our office and then again last October at a conference in North Carolina. But, I’ve never been happy with it. I feel a lot better about it now because the slides and the associated notes have a lot of information on them. I got a lot of great feedback from several people and I’ve made changes accordingly.
But having spent too much time reviewing this I think there are a small number of important concepts that I’m really trying to get across:
- Exadata Smart Scans bypass the block buffer cache
- Exadata Smart Scans happen instead of FULL scans
- Make indexes invisible or increase optimizer_index_cost_adj to encourage FULL scans
- Exadata Smart Scans tend to be part of a HASH JOIN
- HASH JOINs can be sped up by adding PGA memory – pga_aggregate_target
- You can free memory by reducing the size of the block buffer cache – sga_max_size
So, this is the short version of my 45 minute talk More PGA, less SGA – my talk in four words!
- Bobby
Yet another Exadata slides update
Link to the latest version of my Exadata talk slides: zip
Here are the details of my presentation time and place if you are going to Collaborate 13 in Denver:
Session title: Exadata Distinctives
Room: Mile High Ballroom 1C
Date and Time: 1:15PM-2:15PM
I look forward to seeing you there!
- Bobby
PS. Minor revision today 04/04/2013. Got to practice with a few coworkers listening. Feel like this is it for the slides.
Exadata flash cache latency .4 milliseconds
The central point of my Exadata presentation is that data flows from the disks and into and out of memory caches in a different way on Exadata when using a Smart Scan than in non-Exadata Oracle databases. An interesting consequence of the way data is cached in Smart Scans is that it is cached in flash memory which is slower than regular RAM. Non-Smart Scan table accesses may use the faster block buffer cache in the database server’s RAM. I have on my presentation that it takes about 1 millisecond to read from flash memory versus 10 nanoseconds on regular RAM. I got those numbers from an Exadata class put on by Oracle and from memory statistics published on the internet. But, I couldn’t remember if I had verified the 1 millisecond number for flash cache access experimentally so I did a simple test that came back with about .4 milliseconds to read 32K from the flash cache. This is still much slower than RAM but faster than I thought.
Here are my test scripts, logs, and a spreadsheet calculating the result: zip
I slightly modified the script from my previous post to set autotrace on and timing on.
SQL> select sum(BLOCKS) from test; SUM(BLOCKS) ----------- 1983309312 Elapsed: 00:00:00.73
The query ran in .73 seconds.
NAME VALUE ------------------------------- ---------- physical read total IO requests 1841 cell flash cache read hits 1837
Almost every IO was cached in flash memory so I’ll do the calculation as if all the execution time for the query is accounted for by the flash cache reads.
.73 seconds X 1000 milliseconds per second = 730 ms
730 ms/1837 flash cache reads = .4 milliseconds/flash cache read (rounded up from .397)
So, this isn’t perfect but it is one piece of evidence that the flash cache reads are about .4 milliseconds on the Exadata V2 system this was tested on.
- Bobby
PS. I determined that the flash cache reads were 32 K using the following information:
Statistics ---------------------------------------------------------- 7239 physical reads NAME MB --------------------------- ---------- physical read total bytes 56.5546875 NAME VALUE ------------------------------- ---------- physical read total IO requests 1841
56 megabytes read/1841 physical IOs is about 32K. Also, 7239 physical blocks read/1841 physical IOs is about 4 8K blocks per read = 32 K. So, these tests appear to show that 32K flash cache reads take about .4 milliseconds.
Flash cache used when smart scans are disabled
I just wanted to double check something on my Exadata slides. I think I had checked this before but became unsure if something I was saying was true, namely that when you run a normal non-smart scan query on Exadata the cell storage servers still query the flash cache memory for cached disk blocks. So,I tried a full scan with smart scans disabled and it appears, based on statistics kept by the database, that the flash cache was used. Here is the (edited for clarity) output of my test script:
SQL> alter session set cell_offload_processing = FALSE;
SQL> select sum(BLOCKS) from test;
SUM(BLOCKS)
-----------
2213273152
SQL> select a.name,b.value/1024/1024 MB from v$sysstat a,
2 v$mystat b
3 where a.statistic# = b.statistic# and
4 (a.name in ('physical read total bytes',
5 'physical write total bytes',
6 'cell IO uncompressed bytes') or a.name like 'cell phy%');
NAME MB
--------------------------------------------------------------- ----
physical read total bytes 56.5
physical write total bytes 0
cell physical IO interconnect bytes 56.5
cell physical IO bytes sent directly to DB node to balanceCPU u 0
cell physical IO bytes saved during optimized file creation 0
cell physical IO bytes saved during optimized RMAN file restore 0
cell physical IO bytes eligible for predicate offload 0
cell physical IO bytes saved by storage index 0
cell physical IO interconnect bytes returned by smart scan 0
cell IO uncompressed bytes 0
SQL> select a.name,b.value from v$sysstat a,
2 v$mystat b
3 where a.statistic# = b.statistic# and
4 (a.name like '%flash cache read hits' or
5 a.name ='physical read total IO requests');
NAME VALUE
-------------------------------------------------------------- -----
physical read total IO requests 1839
cell flash cache read hits 1805
The alter session turned off the smart scans.
The zero in the “cell physical IO interconnect bytes returned by smart scan” statistic indicates that the smart scans were not used.
The “cell flash cache read hits” statistic > zero indicates that the flash cache is used.
So, assuming we can trust these statistics this test shows that non-smart scan reads of data blocks from cell storage servers can access blocks from flash cache.
- Bobby
Disabling cardinality feedback
I ran into a case today where I had tuned a query by putting a cardinality hint into a view, but the optimizer changed the plan anyway by overriding the cardinality hint with cardinality feedback. So, I found out how to turn cardinality feedback off in case you want the cardinality hint to stick. I built a simple testcase for this post.
Here is the test query:
select /*+ cardinality(test,1) */ count(*) from test;
The first time it runs the plan shows that the optimizer thinks there is one row in the test table:
------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 292 (100)| | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| TEST | 1 | 292 (1)| 00:00:04 | -------------------------------------------------------------------
But the second time cardinality feedback tells the optimizer the truth:
------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 292 (100)| | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| TEST | 31467 | 292 (1)| 00:00:04 | ------------------------------------------------------------------- Note ----- - cardinality feedback used for this statement
How annoying! It ignored my cardinality hint. But you can add this hint to turn off cardinality feedback:
opt_param('_optimizer_use_feedback' 'false')
and then you are back to the original plan with rows = 1. This doesn’t prove that this will help improve performance just that the plan will show the cardinality I’m trying to make it use.
- Bobby
Finding bind variable values using DBA_HIST_SQLBIND
Whenever I need to test a query that has bind variables I usually go to the development team to ask them what typical values are or what the values were the last time the query ran. I’m pretty sure that in the past when I went looking for a DBA_HIST view that held bind variables that I came up empty. Today I’m working on tuning a query with a bind variable and I tried to find the value of the bind variable last Sunday using the DBA_HIST_SQLBIND view. Strangely enough it had exactly what I wanted so I’m not sure if there are cases where this doesn’t capture the variables and cases where it does, but it worked for me today so it may be useful to others in certain cases.
I had the sql_id of the query from an AWR report spanning the period of high CPU usage on Sunday: 40wpuup08vws6. I ran this query to get the bind variable for all executions of this sql_id.
select sn.END_INTERVAL_TIME, sb.NAME, sb.VALUE_STRING from DBA_HIST_SQLBIND sb, DBA_HIST_SNAPSHOT sn where sb.sql_id='40wpuup08vws6' and sb.WAS_CAPTURED='YES' and sn.snap_id=sb.snap_id order by sb.snap_id, sb.NAME;
It produced this output for the bind variable B1.
END_INTERVAL_TIME NAM VALUE_STRING -------------------------- --- ----------------- 03-FEB-13 02.00.32.733 AM :B1 02/02/13 00:00:00 03-FEB-13 03.00.36.316 AM :B1 02/02/13 00:00:00 10-FEB-13 02.00.29.975 AM :B1 02/09/13 00:00:00 10-FEB-13 03.00.23.292 AM :B1 02/09/13 00:00:00 17-FEB-13 02.00.36.688 AM :B1 02/16/13 00:00:00 17-FEB-13 03.00.06.374 AM :B1 02/16/13 00:00:00 24-FEB-13 01.00.33.691 AM :B1 02/23/13 00:00:00 24-FEB-13 02.00.20.269 AM :B1 02/23/13 00:00:00 24-FEB-13 03.00.16.811 AM :B1 02/23/13 00:00:00 03-MAR-13 02.00.17.974 AM :B1 03/02/13 00:00:00 03-MAR-13 03.00.33.340 AM :B1 03/02/13 00:00:00 10-MAR-13 10.00.10.356 PM :B1 03/09/13 00:00:00 10-MAR-13 11.00.43.467 PM :B1 03/09/13 00:00:00 11-MAR-13 12.00.12.898 AM :B1 03/09/13 00:00:00
So, you can see what value this date type bind variable B1 contained each weekend that the query ran. The query runs for multiple hours so that is why it spanned multiple AWR snapshot intervals.
Here is a zip of my test script and its log: zip.
- Bobby
New version of my Exadata presentation
Here is a link to my updated Exadata PowerPoint presentation.
I’ve been trying to improve my Exadata talk for the Collaborate 13 conference. I’ve done this talk at work and at the ECO conference in October but I’m not completely happy with it. The criticism I’ve received boils down to these things:
- The slides don’t stand on their own
- I jump into the middle of the detail without enough introduction
So, I’ve updated the slides to have comments on many of the slides. This will be good as a reminder of what I want to say and to make the slides more meaningful to someone who just has the PowerPoint.
I’ve also added several slides to show the execution plan of a sample query to do a better job of setting up the slides I already have that discuss how data flows through an Exadata server as compared with a normal server. The point is that when the table is accessed blocks are read and certain rows are selected and certain columns are projected.
Lastly I added a slide on direct path read which shows how the buffer cache can be bypassed even on a normal Oracle database server in some cases. This is also part of the introduction in that it provides background needed to understand how the Exadata Smart Scan bypasses the buffer cache.
If anyone has time to read through the slides and give me their feedback I’d be happy to hear it. Hopefully the net result will be a presentation that is both useful to me and my company and to those attending the conference.
- Bobby
P.S. Edited again 3/22/2013
Implicit type conversion in where clause
I spent a lot of time yesterday digging into performance issues on a new database only to find a simple issue where two tables were joined on a column that was a number on one table and a varchar2 on the other. That column was a varchar2(4) on six or eight tables but one table – and it had to be the one with the most rows – had the same column defined as number(4) and as a result the queries were running with inefficient plans. All I had to do to find this was get a plan of the sample query I was working on and look for TO_NUMBER in the predicates section but of course I spent hours looking at other things first. So, I thought I would document how to make a quick check for this kind of type conversion. It is “implicit” type conversion because there is no TO_NUMBER in the sql itself. The optimizer has to add the TO_NUMBER so it can compare the character column to the number column.
Here is how I setup the example tables to mimic the situation I saw yesterday:
-- table with number column create table ntest (div_nbr number(4)); -- table with character column create table ctest (div_nbr varchar2(4));
Here is a sample query that joins these two tables on the columns of the same name but different types:
-- join on the column select count(*) from ntest,ctest where ntest.div_nbr=ctest.div_nbr;
Here is how I get the plan which will includes the predicates section with the TO_NUMBER conversion function:
-- get plan select * from table(dbms_xplan.display_cursor(null,null,'ALL'));
Finally, here is the plan with the predicates section with the TO_NUMBER function:
------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5 (100)|
| 1 | SORT AGGREGATE | | 1 | 17 | |
|* 2 | HASH JOIN | | 1 | 17 | 5 (20)|
| 3 | TABLE ACCESS FULL| NTEST | 1 | 13 | 2 (0)|
| 4 | TABLE ACCESS FULL| CTEST | 1 | 4 | 2 (0)|
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NTEST"."DIV_NBR"=TO_NUMBER("CTEST"."DIV_NBR"))
Step 2 of the plan is the hash join and the predicate information section shows how the two columns are joined for this step. CTEST.DIV_NBR has to be converted to a number before it can be compared to NTEST.DIV_NBR.
So, I recommend putting a check for TO_NUMBER in the predicate information section of the plan into your toolkit of things to check when tuning a query. This kind of thing shows up more often than you would think and it is hard to find because if you look at the query itself you just see a normal join on the DIV_NBR column.
Our resolution of this issue was to change the type of the one table to varchar2(4). This combined with tuning optimizer_index_cost_adj resulted in dramatic improvements in performance on our test system.
Here is a zip of the script I used to show how to find TO_NUMBER in the predicate section.
- Bobby
Found an archive of my geocities blog
OK. This is really cool. I found an archive of my old geocities blog: archive
I thought geocities was gone forever but it isn’t. Quite cool.
- Bobby
Don’t focus on cost of execution plan
I don’t focus on the Oracle optimizer’s cost of a particular execution plan when I’m tuning a query and I’m worried that many of the developers and DBAs that I’ve talked with about query tuning are too focused on lowering the cost of a plan when attempting to tune a query.
I hear comments like this all the time when talking about making a change X to improve the performance of a query. “I changed X and the cost in the explain plan was so much lower.” X could be adding an index or hint or making a parameter change, etc.
I just cringe inside when I hear this and I hope I am gracious but all the time I’m filled with fear that the person I’m talking to is missing a key concept when it comes to Oracle query tuning. The concept is just that in many cases the optimizer’s estimated cost is far off from reality. So, making a change and seeing the cost of the plan go down really doesn’t mean much. It could directly correlate to corresponding improvement in the query run time or it could be just the opposite. It is kind of like the buffer cache hit ratio. Sometimes this ratio really means something and sometimes it doesn’t.
Instead of focusing on the cost I focus on the plan itself. Based on my investigation of the tables in the query and how many rows will be accessed from each I’ve come up with an idea of a plan that should be better than the one I’m improving. So, my question about a proposed change X becomes “Does change X cause the plan to change to the one I determined to be better?”
I attempted to lay out this approach in my Intro to SQL Tuning presentation. For me query tuning is kind of like programming. I’m figuring out the best way to really do the steps of the plan based on my own study of the existing tables. Then I just have to figure out what change to make to get the optimizer to run the query my way. This is a time consuming approach but I would only spend the time on queries that really need it. Who has time to tune every query?
So, my recommended approach to query tuning is to figure out a good plan on your own and then to figure out how to get the database to run it your way. I don’t recommend focusing on what the cost of the new plan is compared with the original plan.
- Bobby







