Re: The old raw devices chestnut.

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Tue, 13 Apr 2004 04:54:05 GMT
Message-ID: <NnKec.7658$A_4.2510@newsread1.news.pas.earthlink.net>

Jim Smith wrote:

> Note the cross-posting - but no flame wars please.
>
> This question was prompted by a thread on the a postgres mailing list
> during which someone (Gregory Williamson) claimed
>
> <quote>
> raw devices, at least on Solaris, are about 10 times as fast as cooked
> file systems for Informix.
> <quote>

That's a pretty steep claim -- I would say it was an exaggeration. Raw is usually faster, all other things being equal, but I doubt if 10 times could be justified.

> This made me think about the old arguments, and I wondered about the
> current state of thinking. Some of my knowledge will be a bit out of date.
>
> Oracle: (my main experience)
> At various times Oracle have claimed (talking to consultants, not
> marketers) that raw devices are 5-20% faster than filesystems. This may
> vary on the current state of the oracle code and/or the filesystem being
> compared against. Veritas seem to agree by producing QuickIO for Oracle,
> claiming "performance of raw with the management of filesystem".
>
> I have never been sufficiently convinced to implement a major system
> with raw.
>
> Sybase: (some experience)
> Sybase claim filesystems are faster, because of OS buffering, but unsafe
> for the same reason. They only ever suggest filesystem for tempdb. They
> don't seem to have heard of fsync()[1]
>
> DB2:
> No idea
>
> Informix:
> No idea beyond the claim which started this off.

The advantage, and possibly disadvantage, of raw i/o over cooked i/o is that the data is copied less. With cooked i/o, the data is copied first from the user process (eg DBMS) to the kernel buffer space, and then from kernel buffer pool to the disk, or vice versa. With raw i/o, the transfer can occur direct from disk to user process without travelling through the kernel buffer pool. That's one less copy operation plus the overhead of coordinating the access. Against that, the kernel buffer pool can sometime provide the same disk block to multiple processes without rereading the disk. However, since Informix's process structure is set up so that all the DBMS data pages are kept in a shared memory segment, all the DBMS processes (distinct from the user processes which aks the DBMS to do things) can share the page without troubling the Unix kernel. So, if your system is busy working on database stuff, the best use of memory is gained by having the disk drivers copy data directly to/from the shared memory buffer pool to disk - making the data available to any of the processes comprising the DBMS without further copying.

YMMV, as they say. It depends on many factors. Generally, I'd quote a 10-20% performance benefit from raw disk over cooked (not times, just percent). That's not something I've measured recently, but it is in the right ballpark for historical systems.

Things like humoungous main memories (TB of main memory) combined with monstrous disks (TB of them, too) and logical volume managers, SAN/NAS, RAID and the like all make the analysis more complex.

> What is the latest thinking, both in terms of vendor claims and
> practical experience?
>
> [1] or whatever system call forces write-through caching

O_FSYNC or O_DSYNC flag on open() system call? There are three or four synchronization options in POSIX 2003, IIRC.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Received on Mon Apr 12 2004 - 23:54:05 CDT