Skip navigation.

Bobby Durrett's DBA Blog

Syndicate content
Oracle database performance
Updated: 9 hours 57 min ago

Check out 6.00.1x computer science class on edX!

Sat, 2015-07-25 11:15

I just finished the last program for a computer science class on edX and I urge you to try it.

I took this class:

MITx: 6.00.1x Introduction to Computer Science and Programming Using Python

I was more interested in how MIT taught the class than in the material itself because I already know the subjects covered.

The class taught the basics of programming – expressions, variables, loops, if statements, and functions.

It also had a large focus on bisection or binary search and the performance benefits of this type of search over sequentially reading through a list.

It also covered lists, hash tables, trees, stacks, and queues.

It discussed object-oriented programming.

The class concluded with the professor stating that the programming and computer science skills taught in this class are key to advancing your career, even if you do not work in a computer related job.

I interacted with a number of students in the class and found some that were in other fields and were having success taking the class.  Others were in business computing or IT and yet did not have a computer science background so they were good programmers but learning new concepts.  Many struggled with the class but, it is free, and is given often. The class starts up again August 26th.  Nothing stops you from taking it multiple times.

I tried to think about whether I should recommend this class to the people I work with as a method of helping develop my coworkers that do not have experience in these areas.  At first I thought that the subject is too academic and has no connection to their jobs. But, after thinking about it for a while, I now believe that just the opposite is true.

Searching for practical applications of the class, I first remembered the programs that we wrote that compared searching sequentially through a list to using binary search.  In one test case the sequential method took 15 seconds but the binary search took less than one second.  This reminded me so much of tuning Oracle SQL queries.  The sequential scan of the list was like a full table scan in Oracle.  The binary search was like looking up a single row using an index scan.  As I tune Oracle queries my computer science knowledge of binary search and binary trees makes it easy to understand index and full table scans.

In another example, we recently had slowness on a Weblogic portal server.  CPU was getting maxed out and the CPU spent most of its time in a Java ConcurrentHashMap object.  I don’t know the internals of Weblogic and I have never used a ConcurrentHashMap but I know how hashing works.  I know that hashing is very fast until your hash table fills up or if the hash function distributes the items in an unequal way. My knowledge of hashing helped me grasp why our portal server was using a lot of CPU despite my lack of access to its internals.

So, contrary to my original fear that the edX class was too academic and not practical I believe that the concepts covered are very practical.  If you do not know how binary search works or what a binary tree is you will benefit from 6.00.1x on edX.  If you can not explain how a hash table works and what causes hashing to slow down you can learn from 6.00.1x. And, if you have never written a computer program, although you may find the class difficult and have to take it more than once, you will benefit from 6.00.1x on edX.

– Bobby

 

Categories: DBA Blogs

Flushing Shared Pool Does Not Slow Its Growth

Thu, 2015-06-18 17:14

I’m still working on resolving the issues caused by bug 13914613.

Oracle support recommended that we apply a parameter change to resolve the issue but that change requires us to bounce the database  and I was looking for a resolution that does not need a bounce.  The bug caused very bad shared pool latch waits when the automatic memory management feature of our 11.2.0.3 database expanded the shared pool.  Oracle support recommending setting _enable_shared_pool_durations=false and I verified that changing this parameter requires a bounce.  It is a big hassle to bounce this database because of the application so I thought that I might try flushing the shared pool on a regular basis so the automatic memory management would not need to keep increasing the size of the shared pool.  The shared pool was growing in size because we have a lot of SQL statements without bind variables.  So, I did a test and in my test flushing the shared pool did not slow the growth of the shared pool.

Here is a zip of the scripts I used for this test and their outputs: zip

I set the shared pool to a small value so it was more likely to grow and I created a script to run many different sql statements that don’t use bind variables:

spool runselects.sql

select 'select * from dual where dummy=''s'
||to_char(sysdate,'HHMISS')||rownum||''';'
from dba_objects;

spool off

@runselects

So, the queries looked like this:

select * from dual where dummy='s0818111';
select * from dual where dummy='s0818112';
select * from dual where dummy='s0818113';
select * from dual where dummy='s0818114';
select * from dual where dummy='s0818115';
select * from dual where dummy='s0818116';
select * from dual where dummy='s0818117';

I ran these for an hour and tested three different configurations.  The first two did not use the _enable_shared_pool_durations=false setting and the last did.  The first test was a baseline that showed the growth of the shared pool without flushing the shared pool.  The second test including a flush of the shared pool every minute.  The last run included the parameter change and no flush of the shared pool.  I queried V$SGA_RESIZE_OPS after each test to see how many times the shared pool grew.  Here is the query:

SELECT OPER_TYPE,FINAL_SIZE Final,
to_char(start_time,'dd-mon hh24:mi:ss') Started, 
to_char(end_time,'dd-mon hh24:mi:ss') Ended 
FROM V$SGA_RESIZE_OPS
where component='shared pool'
order by start_time,end_time;

Here are the results.

Baseline – no flush, no parameter change:

OPER_TYPE       FINAL STARTED         ENDED
--------- ----------- --------------- ---------------
GROW      150,994,944 18-jun 05:03:54 18-jun 05:03:54
GROW      134,217,728 18-jun 05:03:54 18-jun 05:03:54
STATIC    117,440,512 18-jun 05:03:54 18-jun 05:03:54
GROW      167,772,160 18-jun 05:04:36 18-jun 05:04:36
GROW      184,549,376 18-jun 05:47:38 18-jun 05:47:38

Flush every minute, no parameter change:

OPER_TYPE       FINAL STARTED         ENDED
--------- ----------- --------------- ---------------
GROW      134,217,728 18-jun 06:09:15 18-jun 06:09:15
GROW      150,994,944 18-jun 06:09:15 18-jun 06:09:15
STATIC    117,440,512 18-jun 06:09:15 18-jun 06:09:15
GROW      167,772,160 18-jun 06:09:59 18-jun 06:09:59
GROW      184,549,376 18-jun 06:22:26 18-jun 06:22:26
GROW      201,326,592 18-jun 06:42:29 18-jun 06:42:29
GROW      218,103,808 18-jun 06:47:29 18-jun 06:47:29

Parameter change, no flush:

OPER_TYPE        FINAL STARTED         ENDED
--------- ------------ --------------- ---------------
STATIC     117,440,512 18-jun 07:16:09 18-jun 07:16:09
GROW       134,217,728 18-jun 07:16:18 18-jun 07:16:18

So, at least in this test – which I have run only twice – flushing the shared pool if anything makes the growth of the shared pool worse.  But, changing the parameter seems to lock it in.

– Bobby

Categories: DBA Blogs

Overall I/O Query

Tue, 2015-06-16 14:57

I hacked together a query today that shows the overall I/O performance that a database is experiencing.

The output looks like this:

End snapshot time   number of IOs ave IO time (ms) ave IO size (bytes)
------------------- ------------- ---------------- -------------------
2015-06-15 15:00:59        359254               20              711636
2015-06-15 16:00:59        805884               16              793033
2015-06-15 17:00:13        516576               13              472478
2015-06-15 18:00:27        471098                6              123565
2015-06-15 19:00:41        201820                9              294858
2015-06-15 20:00:55        117887                5              158778
2015-06-15 21:00:09         85629                1               79129
2015-06-15 22:00:23        226617                2               10744
2015-06-15 23:00:40        399745               10              185236
2015-06-16 00:00:54       1522650                0               43099
2015-06-16 01:00:08       2142484                0               19729
2015-06-16 02:00:21        931349                0                9270

I’ve combined reads and writes and focused on three metrics – number of IOs, average IO time in milliseconds, and average IO size in bytes.  I think it is a helpful way to compare the way two systems perform.  Here is another, better, system’s output:

End snapshot time   number of IOs ave IO time (ms) ave IO size (bytes)
------------------- ------------- ---------------- -------------------
2015-06-15 15:00:25        331931                1              223025
2015-06-15 16:00:40        657571                2               36152
2015-06-15 17:00:56       1066818                1               24599
2015-06-15 18:00:11        107364                1              125390
2015-06-15 19:00:26         38565                1               11023
2015-06-15 20:00:41         42204                2              100026
2015-06-15 21:00:56         42084                1               64439
2015-06-15 22:00:15       3247633                3              334956
2015-06-15 23:00:32       3267219                0               49896
2015-06-16 00:00:50       4723396                0               32004
2015-06-16 01:00:06       2367526                1               18472
2015-06-16 02:00:21       1988211                0                8818

Here is the query:

select 
to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') "End snapshot time",
sum(after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS) "number of IOs",
trunc(10*sum(after.READTIM+after.WRITETIM-before.WRITETIM-before.READTIM)/
sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO time (ms)",
trunc((select value from v$parameter where name='db_block_size')*
sum(after.PHYBLKRD+after.PHYBLKWRT-before.PHYBLKRD-before.PHYBLKWRT)/
sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO size (bytes)"
from DBA_HIST_FILESTATXS before, DBA_HIST_FILESTATXS after,DBA_HIST_SNAPSHOT sn
where 
after.file#=before.file# and
after.snap_id=before.snap_id+1 and
before.instance_number=after.instance_number and
after.snap_id=sn.snap_id and
after.instance_number=sn.instance_number
group by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS')
order by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS');

I hope this is helpful.

– Bobby

Categories: DBA Blogs

Bug 13914613 Example Shared Pool Latch Waits

Tue, 2015-06-09 17:18

Oracle support says we have hit bug 13914613.  Here is what our wait events looked like in an AWR report:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class latch: shared pool 3,497 17,482 4999 38.83 Concurrency latch: row cache objects 885 12,834 14502 28.51 Concurrency db file sequential read 1,517,968 8,206 5 18.23 User I/O DB CPU 4,443 9.87 library cache: mutex X 7,124 2,639 370 5.86 Concurrency

What really struck me about these latch waits were that the average wait time was several thousand milliseconds which means several seconds.  That’s a long time to wait for a latch.

Oracle pointed to the Latch Miss Sources section of the AWR.  This is all gibberish to me.  I guess it is the name of internal kernel latch names.

Latch Miss Sources Latch Name Where NoWait Misses Sleeps Waiter Sleeps shared pool kghfrunp: clatch: wait 0 1,987 1,956 shared pool kghfrunp: alloc: session dur 0 1,704 1,364

Bug description says “Excessive time holding shared pool latch in kghfrunp with auto memory management” so I guess the “kghfrunp” latch miss sources told Oracle support that this was my issue.

I did this query to look for resize operations:

SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final,
  2  to_char(start_time,'dd-mon hh24:mi:ss') Started,
  3  to_char(end_time,'dd-mon hh24:mi:ss') Ended
  4  FROM V$SGA_RESIZE_OPS;

COMPONENT                 OPER_TYPE               FINAL STARTED                   ENDED
------------------------- ------------- --------------- ------------------------- -------------------------
DEFAULT 2K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
streams pool              STATIC            134,217,728 12-may 04:33:01           12-may 04:33:01
ASM Buffer Cache          STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT buffer cache      INITIALIZING   10,401,873,920 12-may 04:33:01           12-may 04:33:08
DEFAULT 32K buffer cache  STATIC                      0 12-may 04:33:01           12-may 04:33:01
KEEP buffer cache         STATIC          2,147,483,648 12-may 04:33:01           12-may 04:33:01
shared pool               STATIC         13,958,643,712 12-may 04:33:01           12-may 04:33:01
large pool                STATIC          2,147,483,648 12-may 04:33:01           12-may 04:33:01
java pool                 STATIC          1,073,741,824 12-may 04:33:01           12-may 04:33:01
DEFAULT buffer cache      STATIC         10,401,873,920 12-may 04:33:01           12-may 04:33:01
DEFAULT 16K buffer cache  STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT 8K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
DEFAULT 4K buffer cache   STATIC                      0 12-may 04:33:01           12-may 04:33:01
RECYCLE buffer cache      STATIC                      0 12-may 04:33:01           12-may 04:33:01
KEEP buffer cache         INITIALIZING    2,147,483,648 12-may 04:33:02           12-may 04:33:04
DEFAULT buffer cache      SHRINK         10,334,765,056 20-may 21:00:12           20-may 21:00:12
shared pool               GROW           14,025,752,576 20-may 21:00:12           20-may 21:00:12
shared pool               GROW           14,092,861,440 27-may 18:06:12           27-may 18:06:12
DEFAULT buffer cache      SHRINK         10,267,656,192 27-may 18:06:12           27-may 18:06:12
shared pool               GROW           14,159,970,304 01-jun 09:07:35           01-jun 09:07:36
DEFAULT buffer cache      SHRINK         10,200,547,328 01-jun 09:07:35           01-jun 09:07:36
DEFAULT buffer cache      SHRINK         10,133,438,464 05-jun 03:00:33           05-jun 03:00:33
shared pool               GROW           14,227,079,168 05-jun 03:00:33           05-jun 03:00:33
DEFAULT buffer cache      SHRINK         10,066,329,600 08-jun 11:06:06           08-jun 11:06:07
shared pool               GROW           14,294,188,032 08-jun 11:06:06           08-jun 11:06:07

The interesting thing is that our problem ended right about the time the last shared pool expansion supposedly started.  The latch waits hosed up our database for several minutes and it ended right about 11:06.  I suspect that the system was hung up with the bug and then once the bug finished then the normal expansion work started.  Or, at least, the time didn’t get recorded until after the bug finished slowing us down.

So, I guess it’s just a bug.  This is on 11.2.0.3 on HP-UX Itanium.  I believe there is a patch set with the fix for this bug.

Maybe it will be helpful for someone to see an example.

– Bobby

Categories: DBA Blogs

Surprising calibrate_io result

Fri, 2015-05-29 17:26

Recently I used DBMS_RESOURCE_MANAGER.calibrate_io to measure disk I/O performance using a call like this:

DECLARE
 l_latency PLS_INTEGER;
 l_iops PLS_INTEGER;
 l_mbps PLS_INTEGER;
BEGIN

 DBMS_RESOURCE_MANAGER.calibrate_io (num_physical_disks => 10,
 max_latency => 20,
 max_iops => l_iops,
 max_mbps => l_mbps,
 actual_latency => l_latency);

 DBMS_OUTPUT.put_line('Max IOPS = ' || l_iops);
 DBMS_OUTPUT.put_line('Max MBPS = ' || l_mbps);
 DBMS_OUTPUT.put_line('Latency = ' || l_latency);

END;
/

Using this call I have a surprising result.  This test returns very different results when run against two different databases on the same Linux virtual machine and the same filesystem.

database 1:

Max IOPS = 7459
Max MBPS = 863
Latency = 18

database 2:

Max IOPS = 39921
Max MBPS = 1105
Latency = 0

Both databases use direct and asynchronous I/O.  The databases differ in size and configuration.  It seems that something about the databases themselves affects the results since they share the same filesystem on the same machine.

I did not get useful numbers from calibrate_io and have wasted a lot of time trying to interpret its results.  You may want to focus on other tools for measuring disk I/O performance.

– Bobby

Categories: DBA Blogs

Free MIT Computer Science Classes Online

Wed, 2015-05-20 14:30

This is very cool:

There are a bunch of nice computer science classes online from MIT: Free online MIT computer science classes

Here is an introductory computer science class: Intro to computer science

Here is a graded version of the same class on edX: Graded version of MIT intro CS class starting June 10th.

edX does not have as many computer science classes but edX may motivate students because edX classes include grades and certificates for those who pass.

I use computer science every day in my database work but I have not taken a formal class since 1989.

I have been on a computer science kick ever since watching The Imitation Game.  I downloaded Turing’s 1936 paper after watching the movie.  I got about halfway through it before giving up.  It was dense! Maybe will take another stab at it some day. But, the MIT classes are neat because they are the way computer science is now taught, and hopefully they are easier to understand than Turing’s paper.

– Bobby

Categories: DBA Blogs

Simple C program for testing disk performance

Wed, 2015-05-13 13:48

I dug up a simple C program that I wrote years ago to test disk performance.  I hesitated to publish it because it is rough and limited in scope and other more capable tools exist. But, I have made good use of it so why not share it with others?  It takes a file name and the size of the file in megabytes.  It sequentially writes the file in 64 kilobyte chunks.  It opens the file in synchronous mode so it must write the data to disk before returning to the program. It outputs the rate in bytes/second that the program wrote to disk.

Here is a zip of the code: zip

There is no error checking so if you put in an invalid file name you get no message.

Here is how I ran it in my HP-UX and Linux performance comparison tests:

HP-UX:

$ time ./createfile /var/opt/oracle/db01/bobby/test 1024
Bytes per second written = 107374182

real 0m10.36s
user 0m0.01s
sys 0m1.79s

Linux:

$ time ./createfile /oracle/db01/bobby/test 1024
Bytes per second written = 23860929

real 0m45.166s
user 0m0.011s
sys 0m2.472s

It makes me think that my Linux system’s write I/O is slower.  I found a set of arguments to the utility dd that seems to do the same thing on Linux:

$ dd if=/dev/zero bs=65536 count=16384 of=test oflag=dsync
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 38.423 s, 27.9 MB/s

But I couldn’t find an option like dsync on the HP-UX version of dd.  In any case, it was nice to have the C code so I could experiment with various options to open().  I used tusc on hp-ux and strace on Linux and found the open options to some activity in the system tablespace.  By grepping for open I found the options Oracle uses:

hp trace

open("/var/opt/oracle/db01/HPDB/dbf/system01.dbf", O_RDWR|0x800|O_DSYNC, 030) = 8

linux trace

open("/oracle/db01/LINUXDB/dbf/system01.dbf", O_RDWR|O_DSYNC) = 8

So, I modified my program to use the O_DSYNC flag and it was the same as using O_SYNC.  But, the point is that having a simple C program lets you change these options to open() directly.

I hope this program will be useful to others as it has to me.

– Bobby

p.s. Similar program for sequentially reading through file, but with 256 K buffers: zip

Categories: DBA Blogs

db file parallel read on Linux and HP-UX

Tue, 2015-05-05 13:39

In my previous post I described how I could not explain why I got better db file parallel read wait times in a test on Linux than I got running the same test on HP-UX.  I have discovered that the Linux wait times were better because Linux cached the data in the filesystem cache and HP-UX did not.

Neither system used direct I/O for the tests so both could cache data in the filesystem cache.  Evidently Linux does this faster than HP-UX.  I figured this out by repeatedly running the query flushing the buffer cache before each run.  Flushing the buffer cache prevented the table and index from being cached within the database.  On Linux the query ran for the same amount of time for all 5 executions.  On HP-UX it ran much faster after running it for the first time.  Apparently Linux cached the table and index before the first run and HP-UX cached them after the first run.

Here is how I ran the query:

alter system flush buffer_cache;

select /*+ index(test testi) */ sum(blocks) from test;

alter system flush buffer_cache;

select /*+ index(test testi) */ sum(blocks) from test;

alter system flush buffer_cache;

select /*+ index(test testi) */ sum(blocks) from test;

alter system flush buffer_cache;

select /*+ index(test testi) */ sum(blocks) from test;

alter system flush buffer_cache;

select /*+ index(test testi) */ sum(blocks) from test;

Here are the elapsed times for the query on Linux:

Elapsed: 00:00:09.16
Elapsed: 00:00:09.17
Elapsed: 00:00:09.28
Elapsed: 00:00:09.18
Elapsed: 00:00:09.20

Here is the same thing on HP-UX:

Elapsed: 00:01:03.27
Elapsed: 00:00:19.23
Elapsed: 00:00:19.28
Elapsed: 00:00:19.35
Elapsed: 00:00:19.43

It’s not surprising that the HP-UX times with the data cached are twice that of Linux.  An earlier post found the processor that I am evaluating on Linux was about twice as fast as the one I’m using on HP-UX.

Just to double-check that the caching was really at the filesystem level I turned direct I/O on for the Linux system using this parameter:

alter system set filesystemio_options=DIRECTIO scope=spfile;

I ran the test again after bouncing the database to make the parameter take effect and the run times were comparable to the slow first run on HP-UX:

Elapsed: 00:01:12.03
Elapsed: 00:01:06.69
Elapsed: 00:01:12.98
Elapsed: 00:01:10.14
Elapsed: 00:01:07.21

So, it seems that without the filesystem cache this query takes about 1 minute to run on either system.  With caching the query runs under 20 seconds on both systems.

In some ways I think that these results are not important.  Who cares if Linux caches things on the first attempt and HP-UX on the second?

The lesson I get from this test is that HP-UX and Linux are different in subtle ways and that when we migrate a database from HP-UX to Linux we may see performance differences that we do not expect.

Here is a zip of my script and its logs: zip

– Bobby

Categories: DBA Blogs

db file parallel read faster on Linux than HP-UX?

Fri, 2015-05-01 16:39

I am still working on comparing performance between an HP-UX blade and a Linux virtual machine and I have a strange result.  I tried to come up with a simple example that would do a lot of single block I/O.  The test runs faster on my Linux system than my HP-UX system and I’m not sure why.  All of the parameters are the same, except the ones that contain the system name and filesystem names.  Both systems are 11.2.0.3.  The dramatic difference in run times corresponds to an equally dramatic difference in db file parallel read wait times.

I created a table called TEST and populated it with data and added an index called TESTI. I ran this query to generate a lot of single block I/O:

select /*+ index(test testi) */ sum(blocks) from test;

Here is the result on HP:

SUM(BLOCKS)
-----------
 1485406208

Elapsed: 00:01:28.38

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
    3289143  consistent gets
     125896  physical reads
      86864  redo size
        216  bytes sent via SQL*Net to client
        248  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

select EVENT,TOTAL_WAITS,TIME_WAITED,AVERAGE_WAIT
  2  FROM V$SESSION_EVENT a
  3  WHERE a.SID= :monitored_sid
  4  order by time_waited desc;

EVENT                          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT
------------------------------ ----------- ----------- ------------
db file parallel read                 4096        6760         1.65
db file sequential read              14526         236          .02
events in waitclass Other                1          28        28.49
SQL*Net message from client             19           5          .28
db file scattered read                   5           3          .65
SQL*Net message to client               20           0            0
Disk file operations I/O                 1           0          .01

Here is the same thing on Linux:

SUM(BLOCKS)
-----------
  958103552

Elapsed: 00:00:09.01

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
    3289130  consistent gets
     125872  physical reads
      77244  redo size
        353  bytes sent via SQL*Net to client
        360  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

select EVENT,TOTAL_WAITS,TIME_WAITED,AVERAGE_WAIT
  2  FROM V$SESSION_EVENT a
  3  WHERE a.SID= :monitored_sid
  4  order by time_waited desc;

EVENT                          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT
------------------------------ ----------- ----------- ------------
db file parallel read                 4096          55          .01
events in waitclass Other                1          17        16.72
db file sequential read              14498          11            0
SQL*Net message from client             19           6          .31
db file scattered read                  15           0            0
SQL*Net message to client               20           0            0
Disk file operations I/O                 1           0            0

Something doesn’t seem right.  Surely there is some caching somewhere.  Is it really possible that the Linux version runs in 9 seconds while the HP one runs in a minute and a half?  Is it really true that db file parallel read is 1 hundredth of a second on HP and .01 hundredths of a second on Linux?

I’m still working on this but thought I would share the result since it is so strange.

Here is a zip of my scripts and their logs if you want to check them out: zip

– Bobby

p.s. Here are some possibly relevant parameters, same on both system:

compatible                   11.2.0.0.0
cpu_count                    4
db_block_size                8192
db_cache_size                512M
db_writer_processes          2
disk_asynch_io               FALSE
dispatchers                  (PROTOCOL=TCP)(DISPATCHERS=32)
filesystemio_options         ASYNCH
large_pool_size              32M
log_buffer                   2097152
max_shared_servers           12
pga_aggregate_target         5871947670
sga_max_size                 3G
sga_target                   3G
shared_pool_size             1G
shared_servers               12
star_transformation_enabled  FALSE
Categories: DBA Blogs

Simple test of DB server CPU speed

Thu, 2015-04-30 16:55

I’m trying to compare two types of database servers and it looks like one has a faster CPU than the other.  But, the benchmark I have used runs a complicated variety of SQL so it is hard to really pin down the CPU performance.  So, I made up a simple query that eats up a lot of CPU and does not need to read from disk.

First I created a small table with five rows:

create table test (a number);

insert into test values (1);
insert into test values (1);
insert into test values (1);
insert into test values (1);
insert into test values (1);

commit;

Then I ran a query Cartesian joining that table to itself multiple times:

select
sum(t1.a)+
sum(t2.a)+
sum(t3.a)+
sum(t4.a)+
sum(t5.a)+
sum(t6.a)+
sum(t7.a)+
sum(t8.a)+
sum(t9.a)+
sum(t10.a)
from 
test t1,
test t2,
test t3,
test t4,
test t5,
test t6,
test t7,
test t8,
test t9,
test t10;

Then I used one of my profile scripts to extract the CPU.  Here is a typical output:

SUBSTR(TIMESOURCE,1,30)           SECONDS PERCENTAGE
------------------------------ ---------- ----------
TOTAL_TIME                             32        100
CPU                                    32        100

I edited the output to make it fit.  The profile shows the time that the query spent on the CPU in seconds.

I tried multiple runs of the same query and kept adding tables to the join to make the query longer.

This zip includes the sql scripts that I ran and my spreadsheet with the results: zip

I was comparing an Itanium and a Xeon processor and the test query ran in about half the time on the Xeon.  I realize that this is not a complete benchmark, but it is some information.  My other testing is not targeted specifically to CPU but I also saw a significant CPU speed-up there as well.  So, this simple query adds to the evidence that the Xeon processor that I am evaluating is faster than the Itanium one.

– Bobby

Categories: DBA Blogs

DDL_LOCK_TIMEOUT to sneak in change on active system

Wed, 2015-04-29 14:54

I need to change a view and an index on an active production system.  I’m concerned that the change will fail with a “ORA-00054: resource busy” error because I’m changing things that are in use.  I engaged in a twitter conversation with @FranckPachot and @DBoriented and they gave me the idea of using DDL_LOCK_TIMEOUT with a short timeout to sneak in my changes on our production system.  Really, I’m more worried about backing out the changes since I plan to make the change at night when things are quiet.  If the changes cause a problem it will be during the middle of the next day.  Then I’ll need to sneak in and make the index invisible or drop it and put the original view text back.

I tested setting DDL_LOCK_TIMEOUT to one second at the session level.  This is the most conservative setting:

alter session set DDL_LOCK_TIMEOUT=1;

I created a test table with a bunch of rows in it and ran a long updating transaction against it like this:

update /*+ index(test testi) */ test set blocks=blocks+1;

Then I tried to alter the index invisible with the lock timeout:

alter index testi invisible
 *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

Same error as before.  The update of the entire table took a lot longer than 1 second.

Next I tried the same thing with a shorter running update:

update /*+ index(test testi) */ test set blocks=blocks+1 where owner='SYS' and table_name='DUAL';
commit;
update /*+ index(test testi) */ test set blocks=blocks+1 where owner='SYS' and table_name='DUAL';
commit;
... lots more of these so script will run for a while...

With the default setting of DDL_LOCK_TIMEOUT=0 my alter index invisible statement usually exited with an ORA-00054 error.  But, eventually, I could get it to work.  But, with DDL_LOCK_TIMEOUT=1 in my testing my alter almost always worked.  I guess in some cases my transaction exceeded the 1 second but usually it did not.

Here is the alter with the timeout:

alter session set DDL_LOCK_TIMEOUT=1;

alter index testi invisible;

Once I made the index invisible the update started taking 4 seconds to run.  So, to make the index visible again I had to bump the timeout up to 5 seconds:

alter session set DDL_LOCK_TIMEOUT=5;

alter index testi visible;

So, if I have to back out these changes at a peak time setting DDL_LOCK_TIMEOUT to a small value should enable me to make the needed changes.

Here is a zip of my scripts if you want to recreate these tests: zip

You need Oracle 11g or later to use DDL_LOCK_TIMEOUT.

These tests were all run on Oracle 11.2.0.3.

Also, I verified that I studied DDL_LOCK_TIMEOUT for my 11g OCP test.  I knew it sounded familiar but I have not been using this feature.  Either I just forgot or I did not realize how helpful it could be for production changes.

– Bobby

Categories: DBA Blogs

PLAN_HASH_VALUE calculation different HP-UX and Linux?

Wed, 2015-04-22 17:01

I’m trying to compare how a query runs on two different 11.2.0.3 systems.  One runs on HP-UX Itanium and one runs on 64 bit x86 Linux.  Same query, same plan, different hash value.

HP-UX:

SQL_ID 0kkhhb2w93cx0
--------------------
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts
ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL,
:13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16,
spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where
ts#=:1 and file#=:2 and block#=:3

Plan hash value: 1283625304

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |                |       |       |     2 (100)|          |
|   1 |  UPDATE               | SEG$           |       |       |            |          |
|   2 |   TABLE ACCESS CLUSTER| SEG$           |     1 |    65 |     2   (0)| 00:00:01 |
|   3 |    INDEX UNIQUE SCAN  | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Linux:

SQL_ID 0kkhhb2w93cx0
--------------------
update seg$ set type#=:4,blocks=:5,extents=:6,minexts=:7,maxexts=:8,exts
ize=:9,extpct=:10,user#=:11,iniexts=:12,lists=decode(:13, 65535, NULL,
:13),groups=decode(:14, 65535, NULL, :14), cachehint=:15, hwmincr=:16,
spare1=DECODE(:17,0,NULL,:17),scanhint=:18, bitmapranges=:19 where
ts#=:1 and file#=:2 and block#=:3

Plan hash value: 2170058777

----------------------------------------------------------------------------------------
| Id  | Operation             | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |                |       |       |     2 (100)|          |
|   1 |  UPDATE               | SEG$           |       |       |            |          |
|   2 |   TABLE ACCESS CLUSTER| SEG$           |     1 |    64 |     2   (0)| 00:00:01 |
|   3 |    INDEX UNIQUE SCAN  | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

I wonder if the endianness plays into the plan hash value calculation? Or is it just a port specific calculation?

Odd.

– Bobby

Categories: DBA Blogs