Re: Database writing architecture

From: Paul Zola <pzola_at_us.oracle.com>
Date: 1996/04/14
Message-ID: <4kpjqg$td_at_inet-nntp-gw-1.us.oracle.com>#1/1


In <829398929snz_at_copse.demon.co.uk> Eoin Woods <Eoin_at_copse.demon.co.uk> writes:
} In article <316D6BD4.41C67EA6_at_exu.ericsson.se>
} exuward_at_exu.ericsson.se "Gerald Ward" writes:
 [snip]
} > Sybase may have problems with database consistency
} > if filesystem databases are used because the OS buffers
} > the writes. Apparantly, Oracle doesn't have this problem
} > or it isn't as big of an issue. Is this indeed true?
} I don't really see how this could be true. The Unix operating system
} buffers writes to its file systems. Hence, when the DBMS writes a log
} entry (record/page/...) if the log is held on a filesystem, it cannot
} be sure that the entry has made it to disk. Should there then be a
} system failure, the log on disk may not be intact. Unix only writes its
} file system buffers to disk when the "sync" processing occurs
[snip]

This is not strictly true. All modern versions of UNIX provide an ability to modify the behavior of the buffer cache.

BSD-derived systems provide the fsync() system call, which flushes all the dirty buffers associated with a file descriptor. After the fsync() call completes, the operating system guarantees that all the buffered data associated with the file descriptor has been written to the disk.

SystemV-derived systems provide the O_SYNC flag. This can be used in 2 ways: it can be used as part of the third flag to open(), or it can be used as part of the third argument of fcntl(), when used with the F_SETFL argument. When the O_SYNC flag is set on a file descriptor, the operating system guarantees that when a write() system call returns, the data from the write has been written to the disk.

ORACLE uses the fsync() or O_SYNC capabilities of UNIX to guarantee that the redo log files are up-to-date. If the OS crashes, ORACLE will use the (accurate) data in the redo logs to roll forward the (possibly inaccurate) data in the data files.

Providing that the OS correctly implements the fsync() or O_SYNC capabilities, there is no chance of data loss when using ORACLE with filesystem files.

I have no direct experience with Sybase, so I can't say for sure whether or not Sybase runs the risk of database corruption when using filesystem files. If true, there's no inherent limitation in UNIX that makes this so.

        -p


Paul Zola                                                 Technical Specialist
World-Wide Technical Support                                 Development Tools
==============================================================================
Computers possess the truly profound stupidity of the inanimate. - B. Sterling
Disclaimer: 	Opinions and statements are mine, and do not necessarily
		reflect the opinions of Oracle Corporation.
Received on Sun Apr 14 1996 - 00:00:00 CEST

Original text of this message