Re: Raw Devices: Increased Performance?

From: Paul Zola <pzola_at_us.oracle.com>
Date: 1996/07/15
Message-ID: <4scgh7$jdk_at_inet-nntp-gw-1.us.oracle.com>#1/1


} Jo Manna wrote:

} > Thanks for the reply. This is quite interesting. I have previously
} > used Sybase and I remember that using Unix File Systems instead of
} > Raw Devices was not recommended. Just off the top of me head the
} > reasons for this was something like...
} >
} > ... the Sybase Server being in 'charge' of the actual I/O and if using a
} > Unix File System could not guarantee that a 'commit was a commit',
} > due to the OS buffering..... and so on.
} >
} > Obviously we are talking about Oracle here and not Sybase, but I am
} > just wondering how Oracle gets around this if at all, or have I missed
} > something?

You know, this is (at least) the second time that I've heard someone say that Sybase told them that UNIX did not allow for transactional behavior unless they used raw devices. I'm beginning to think that someone at Sybase is actually saying this.

This is, of course, not true. While it is true that the default behavior of the UNIX filesystem (write-back buffered cache) does not allow for transactional consistancy, all modern versions of UNIX provide an ability to modify the behavior of the buffer cache.

BSD-derived systems provide the fsync() system call, which flushes all the dirty buffers associated with a file descriptor. After the fsync() call completes, the operating system guarantees that all the buffered data associated with the file descriptor has been written to the disk.

SystemV-derived systems provide the O_SYNC flag. This can be used in 2 ways: it can be used as part of the third flag to open(), or it can be used as part of the third argument of fcntl(), when used with the F_SETFL argument. When the O_SYNC flag is set on a file descriptor, the operating system guarantees that when a write() system call returns, the data from the write has been written to the disk.

ORACLE uses the fsync() or O_SYNC capabilities of UNIX to guarantee that the redo log files are up-to-date. If the OS crashes, ORACLE will use the (accurate) data in the redo logs to roll forward the (possibly inaccurate) data in the data files.

Providing that the OS correctly implements the fsync() or O_SYNC capabilities, there is no chance of data loss when using ORACLE with filesystem files.

I have no direct experience with Sybase, so I can't say for sure whether or not Sybase runs the risk of database corruption when using filesystem files. If true, there's no inherent limitation in UNIX that makes this so.

        -paul



Paul Zola Technical Specialist World-Wide Technical Support

GCS H--- s:++ g++ au+ !a w+ v++ C+++ UAV++$ UUOC+++$ UHS++++$ P+>++ E-- N++ n+

    W+(++)$ M+ V- po- Y+ !5 !j R- G? !tv b++(+++) !D B-- e++ u** h f-->+ r*


Disclaimer: 	Opinions and statements are mine, and do not necessarily
		reflect the opinions of Oracle Corporation.
Received on Mon Jul 15 1996 - 00:00:00 CEST

Original text of this message