Re: Unix Filesystem vs Raw Partition

From: Lee E Parsons <lparsons_at_world.std.com>
Date: Mon, 17 Oct 1994 19:58:06 GMT
Message-ID: <Cxu24v.J6F_at_world.std.com>


In article <1994Oct14.113251.12844_at_mojo.europe.dg.com>,  <jpope_at_mojo.europe.dg.com> wrote:
>Lee E Parsons (lparsons_at_world.std.com) wrote:
>: Paul Zola <pzola_at_us.oracle.com> wrote:
>: >
>: >The Oracle RDBMS uses one of these two system calls to guarantee that
>: >the data associated with a transaction has "hit the disk" by the time
>: >that the COMMIT returns.
 

>: Will the entire system be synced on each commit? I would have thought
>: that this would have had performnce implications. I had always figured
>: that all writes to the redo log would be backed by a O_SYNC and the
>: entire DB would be synced at a checkpoint. This is the least amount of
>: syncing I can see that still ensures integrity. A sync only at commit
>: would still leave you at risk if uncommited data had been flushed out
>: of the SGA and writes to the redo logs was still pending.
 

>: Any white papers on how this works? BTW, You should have answered
>: this on Monday. I would have been too busy working then to bother
>: you. :-}
>: --
>: Regards,
 

>: Lee E. Parsons
>: Systems Oracle DBA lparsons_at_world.std.com
>
>
>This is incorrect. Only the redo log writer issues sync'ed writes at commit
>time, which is what guarantees your transaction can be re-applied from the
>redo's in the event of a system crash. The actual block changes, held in the
>block buffers, get written out later by the dbwr at timeout intervals or
>at a checkpoint, implicit or otherwise. Whether you have raw i/o or not is
>irrelevent to this process - all you achieve is avoiding many software layers
>in your UNIX kernel which map logical block addresses to physical disk
>positions, since Oracle is doing all that for itself. Note that in UNIX
>systems that implement logical disk conecpts, such as striping from the
>O/S as opposed to on a hardware product (like Clariion), then some mapping
>still has to occur, so there may less benefit.
>
>

<jpope_at_mojo.europe.dg.com> wrote:
>Lee E Parsons (lparsons_at_world.std.com) wrote:
>: A sync only at commit
>: would still leave you at risk if uncommited data had been flushed out
>: of the SGA and writes to the redo logs was still pending.
>
>This is incorrect. Only the redo log writer issues sync'ed writes at commit
>time, which is what guarantees your transaction can be re-applied from the
>redo's in the event of a system crash. The actual block changes, held in the
>block buffers, get written out later by the dbwr at timeout intervals or
>at a checkpoint, implicit or otherwise.

Except that given activity in the SGA and the ammount of time it takes for the user to hit commit, the block changes could be written out before the user commited. If that were the case you would have to make sure data in the redo log was actually written to disk or instance recovery wouldn't work. For this to be a problem I think you would have to have a situation where an uncommited dirty block got written to a datafile but the buffer containing the changed data in the rollback didn't.

Regardless of the real answer I think we are 1) geting pretty technical about a question that has already been answered and 2) both wrong.

I got a mail message from a guy at Oracle that states.

"All Oracle files are open()ed with the O_SYNC flag set on most Unix boxes"

and

"Oracle chooses to be pessimistic and sync anything that DBWR or LGWR writes"

Certianly this response could be incorrect, but I think we have answered the original question. Writes to the FS under Oracle are not a "serious data integrity problem"

> Whether you have raw i/o or not is
> irrelevent to this process - all you achieve is avoiding many software layers
> in your UNIX kernel which map logical block addresses to physical disk

But under raw i/o you dont have to worry about anybody taking your write and telling you its finished when it really isnt.certianly async i/o packages are becomming more common ie) Suns DB accelerater, but there are still plenty of systems out there where raw means fast synchronous I/O and UFS means slow asynchronous I/O

Having said that can anyone confirm this? This has been my understanding for a long time . Is it correct?

-- 
Regards, 

Lee E. Parsons                  		
Systems Oracle DBA	 			lparsons_at_world.std.com
Received on Mon Oct 17 1994 - 20:58:06 CET

Original text of this message