Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> RE: are redo records always flushed in order?

RE: are redo records always flushed in order?

From: Adrian <>
Date: Fri, 27 Apr 2007 20:18:18 +0100
Message-ID: <>

It is likely that under 10g flushes at 1/3rd full and 1MB redo may now be out of date in detail, if not in spirit. Log buffers of 14M+ are now commonplace and automatically managed if you leave log_buffer unset.

On AIX with CIO the docs state you should set JFS2 filesystem block size (agblksize)=512bytes, and if you truss 10gR2 lgwr you see LISTIO syscalls writing a set of what I suspect are 512byte chunks to disk in parallel; the results are then checked with (sometimes multiple) AIO_NOWAIT_TIMEOUT syscalls to ensure the writes have completed successfully.

My guess would be that the circular log buffer is flushed in order that it is written to up to the event/commit point. Uncommitted transaction data would be written out as well.

If TX1 commits before TX2, TX2 could never be written to disk before TX1, but they could possibly be written at the same time. </EDUCATED GUESS>

Corrections welcome.


-----Original Message-----
From: [] On Behalf Of jametong Sent: 26 April 2007 12:01
To:; 'Jeremy Paul Schneider' Cc:
Subject: ??: are redo records always flushed in order?

--but if size to flush > os max single write size lgwr will flush at following conditions.

1) log buffer is 1/3 full of log_buffer
2) log buffer is has 1mB redo
3) issue a commit

As I know , Max single write of aix and linux are all 1mB size of data.


DBA Team. B2B. 阿里巴巴中国

Phone   : 0571-85022088-12372
Fax     : 0571-85022958
Mobile  : 135-8803-9828
Mail    :

Msn     :
Yahoo   : jametong

Alitalk : jametong
Skype : jametong

发件人: [] 代 表 Jessica Mao
发送时间: 2007年4月26日 15:45
收件人: Jeremy Paul Schneider
主题: Re: are redo records always flushed in order?


really appreciate your reply.

agree to 1.), thought the same way. but since this was a really important Q from bosses, i was asked to make sure. ;o(

2.) no longer true in 10g r2. thus i specifically mentioned commit methods (immediate, wait, batch, nowait). if it's batch nowait, commit returns w/o even posting LGWR. we do have some apps that can tolerate potential data loss from batch nowait. just re-run the apps. but we would not want any logical data corruption. (thus i'm investigating
here) e.g. tx1 commit batch nowait, tx2 starts immediately after, reading data modified by tx1, _based on which_ does its own changes, then commit immediate wait, which could cause tx1 and tx2 to be flushed together. if for any reason tx2's records get into redo log before tx1's, even worse, if instance crashes in between, there would be logical data corruption. shouldn't happen but again wanted to make sure.
(and would the zero-copy redo and private redo strands cause any issue when enabled?)

a flush (redo write) _is_ one operation to LGWR. but if size to flush > os max single write size, it'll have to be handled by multiple physical writes. my digging there was also trying to find any possibilities for data corruption. (yes, over worrying)

yes, all will be replayed during recovery. but when a flush contained 2 commit records (or more), 1 made it into redo log just before crash, 1 didn't, after all has been replayed, does oracle rollback just the 2nd tx or both? i guess both but _not_ sure.

thanks a lot! -jessica

Jeremy Paul Schneider wrote, On 4/25/2007 8:00 PM:
> Not sure I can completely answer your question but here's a start.
> 1) Are records written sequentially? Although I have no concrete
> proof at the moment I'm pretty sure the answer is yes. I don't think
> that Oracle will ever do a "seek" backwards in a redo log; that seems
> pretty problematic for many reasons.
> 2) The most important thing to remember is that when a transaction
> issues a COMMIT statement, the COMMIT does not return to the client
> (and is not successful) until the redo data is written to disk.
> First, if all eight I/O operations were being written together then
> TX1 does not succesfully COMMIT until the 8th I/O is complete.
> However I don't think that it works this way; a COMMIT will always
> cause an immediate log buffer flush so when TX1 COMMITs it will
> probably cause an I/O for the first 5 blocks and when TX2 COMMITs it
> will cause an I/O for the last 3.
> Second, all of the transactions will ALWAYS be replayed from the
> redolog regardless of whether the transaction committed. After
> instance startup and after redo replay, uncommitted transactions will
> be rolled back in the background by SMON (or sooner if a session tries
> to access a block before SMON gets to it). So replaying the redo log
> is not conditional on whether or not a transaction was "COMMITTED".
> For more detailed info check out Julian Dyke's excellent presentation
> Transaction Internals here:
> ns.html#TransactionInternals
> -Jeremy


> On 4/25/07, *Jessica Mao* <
> <>> wrote:

> i'm afraid my questions were not well presented. more descriptions:
> quite often single redo write needs to flush multiple redo records
> and/or commit records to redo log. let's say this time LGWR finds
> in log
> buffer 10 redo records and 1 commit record for TX1, followed by 3 redo
> records and 1 commit record for TX2, and is going to flush them
> all in
> single redo write.
> 1.) are the records always flushed in the same order as they were
> generated in log buffer? in this case, are TX1's records always
> flushed
> no later than TX2's? doesn't matter how TX1 / TX2 committed
> (immediate,
> wait, batch, nowait)
> 2.) assume answer to 1.) is yes. say it takes 8 physical I/O at OS
> level
> to serve this redo write, first 5 for TX1's records, then 3 for TX2's.
> what happens if instance crashes at I/O #7? all TX1's records are
> already written to the redo log. TX2's are not. during recovery, would
> db discard TX1's records in redo log and rollback TX1 instead
> replay? if
> rollback, how does db know that TX1 belonged to a failed redo write?
> through on-disk RBA? if replay, then redo write is not atomic.
> thanks! -Jessica
> Jessica Mao wrote, On 4/25/2007 12:37 PM:
> > Dear Gurus,
> >
> > Could you please help me with my questions below?
> >
> > if TX1 starts and commits (no matter what mode: immediate, wait,
> > batch, nowait). after that TX2 starts and commits (whatever
> mode). are
> > TX1's redo records (including the commit record) always flushed no
> > later than TX2's? is there ANY chance that TX2's records (including
> > the commit record) would be flushed first? i can't think of any
> but i
> > could be missing something.
> >
> > a related question: (maybe i'm worrying too much) is redo write
> always
> > atomic? when single redo write size is bigger than max OS I/O write
> > size, 1 redo write may take several physical writes to finish. if
> > there's instance crash in the middle of the several physical
> writes,
> > is the db able to discard the already written to disk commit records
> > and roll back ALL transactions associated with this redo write? if
> > yes, how does the db achieve that? using on-disk RBA?
> >
> > would the zero-copy redo and private redo strands features introduce
> > new issue on this matter?
> >
> > the DB is 10gR2 (since i mentioned the different commit methods), OS
> > are mainly hp-ux 11, solaris 10, windows xp.
> >
> > thanks a lot!
> >
> > Jessica Mao
> >
> --
> > > > >

> --
> Jeremy Schneider
> Chicago, IL


Received on Fri Apr 27 2007 - 14:18:18 CDT

Original text of this message