Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Moving db to linux

Re: Moving db to linux

From: Nuno Souto <dbvision_at_optusnet.com.au>
Date: Sun, 29 Feb 2004 02:13:12 +1100
Message-ID: <01b901c3fe0d$6da73350$9b00a8c0@dcs001>

> Journalling for files is a concept similar to redo in the world
> of oracle.

No, it MOST DEFINITELY is not. Journalled file systems are similar to redo ONLY for file system metadata. NOT for the data itself!

> With JFS, you get the process called jfsCommit running,
> which "commits" buffer operations. Each filehandle operation like
> "flush" or "close" is a "commit".

So it is in a non-journalled file system. "flush" has existed in normal file systems since the year dot and does exactly and precisely that. There is also a background process in non-JFS file systems that flushes every 30 seconds or so: it's called "sync".

> Basically, journalled FS guarantees
> that the data written down synchronously will really written down
> to the disk device(s).

ANY file system guarantees that data written synchronously is really written to the disk device.
Synchronous access is NOT a synonym for journalling.

> If you can do DIO, your data is a little bit
> safer.

Most file systems can do DIO. It's got nothing to do with journalling itself.

>What a journalling FS protects you against is a huge data loss
> of blocks that were in the buffer cache.

NO WAY! If you do NOT write synchronously in a JFS, you WILL lose ANY data blocks in the cache!

And to write synchronously you have to use synchronous I/O, DIO or frequent "flushes". Which you can equally do in ANY file system, be it journalled or not.

I repeat: Synchronous writing has NOTHING to do with journalling.

What a JFS really does is to automatically (like it or not) write - synchronously - to a journal file, ANY changes to file system METADATA. IOW, any changes that involve creation/delete files, allocation of disk space or freeing of disk space.

Those and ONLY those are recovered after a system crash, by simply reading from the journal file. Instead of inspecting the ENTIRE file system looking for broken metadata. Which is what fsck does in a non-journalled file system.

With the result (in a JFS) that you do not lose large chunks of a file. This is the problem that fsck has with non-journaled file systems: sometimes it cannot recover the metadata and it loses track of an entire space
allocation for a file. Which can be a substantial part of the file. This happens mostly when files are very volatile or constantly changing in allocation.

Which is NOT the case for Oracle datafiles. They are pre-allocated and do not often change in size.

It's high time this myth of journalled file systems "protecting" data is exposed. A run-of-the-mill JFS does NOT protect data blocks inside files, it protects ONLY the file system's own meta data! That is certainly the case of ext3, JFS, NTFS and many other journalled f/s. Veritas is the only JFS I know of that can ALSO protect the data but that is an add-on, not a characteristic of JFS.

Historical note:
This f/s metadata thing is the major factor why I never lost a benchmark against
Ingres: journalled file systems were unknown back then and Ingres did not use the concept of pre-allocated datafiles like Oracle. Their tables were stored one table per file, with dynamic space management done by the file system itself. With the result that if you specified a benchmark where tables
were dropped/re-created and inserted/deleted from and you pulled the plug half
way through, you'd have a very high probability fsck would NOT recover the file system where the Ingres database was.

While Oracle would quietly just rollback the last transaction and keep going.
After the fsck was finished, of course. Remember: no JFS back then! Not once
did I have to use the redo log. Datafiles were pre-allocated and the f/s metadata
never changed, no matter how busy the system was.

As well, not ONCE did Ingres survive this little "technique"! Cheers
Nuno Souto
in sunny Sydney, Australia
dbvision_at_optusnet.com.au



Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Received on Sat Feb 28 2004 - 09:11:07 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US