New dirty tricks in Red Hat 5.x

From: Mladen Gogala <>
Date: Wed, 20 May 2009 06:28:58 +0000 (UTC)
Message-ID: <gv07va$k10$>

  1. CFS and ionice Kernels newer then 2.6.13 come with the "CFS" (Completely Fair Scheduler) I/O scheduler which can assign I/O priorities to processes. ionice -c 1 -n 0 -p `pgrep -f lgwr` will assign the highest real-time priority to the oracle log writer. If two or more processes have something to write to the same disk, log writer will always get there first. The same thing can be done with DBWR as well. There isn't much sense in doing that when ASM is involved, as ASM usually has its own piece of disk for exclusive use. This usually helps on a busy file system single instance databases, where file systems are used for other apps as well. Classic example is having both Apache and Oracle on the same system. Those two pieces of software have completely different I/O usage patterns and you may decide to give one of them higher priority than the other one. It doesn't necessarily have to be Oracle.
  2. VDSO (Virtual Dynamic Shared Object) Thanks to the Intel Corp., the newer kernels have the new system call entry/exit mechanism called "VDSO". This mechanism is about 40 CPU cycles cheaper then the old, interrupt based mechanism. This is 40 CPU cycles per system call invocation, which can accumulate quite a lot. Admittedly, the effects are not spectacular but help squeeze out every last drop of the CPU power you have. In order to enable it, one can either enable it through sysctl or do something like echo 1> /proc/sys/vm/vdso_enabled Newer kernels come with that enabled, so there isn't much to do.
  3. Blockdev The "blockdev" command is used to get/set disk characteristics at the driver level. One particularly useful option is to set readahead to, say, 16M. I've never done this with a production instance but the effects were good on an Oracle running on top of F8. Direct I/O was not turned on, testing is in progress.
  4. Readahead. RH EL 5.x has two services called "readahead_early" and "readahead_later". Both services will do fstat of the files listed in the configuration directory (/etc/readahead.d/default.*) and get the corresponding inodes in the inode cache, thus speeding up the next open. It's quite a nice thing to do with oracle: All files in $ORACLE_HOME/bin, $ORACLE_HOME/lib and various "*.jar" files can all be cached to speed up later reads and activation.

These are few dirty tricks possible mainly with the RH EL 5.x and its clones (CentOS, OEL). Experiment but be aware that this is a new and largely untested stuff, unsupported by Oracle Corp. and probably not fit for production. There will be more good stuff to come. with the kernels 2.6.20 and newer. The main thing with 2.6.20 is that it does I/O accounting, so that the question "who does the most of disk I/O" can be answered. There is a simple tool called "iotop" which works on kernels 2.6.20+ and does just that. Happy hacking!

Received on Wed May 20 2009 - 01:28:58 CDT

Original text of this message