Re: are redo records always flushed in order?

From: Jessica Mao <jessica.mao_at_oracle.com>
Date: Fri, 27 Apr 2007 02:34:45 -0700
Message-ID: <4631C3B5.7020802@oracle.com>

by introducing batch nowait oracle has left the tx durability in users' hands. i believe physically db will still be consistent. (engineers at oracle ain't that bad. ;o) ) but logically the data could be corrupted.

tx1 tx2 below belong to different sessions. that's the point, if some sessions are running in batch nowait, some immediate wait, and have data overlapping/dependence, any chances for data corruption?

coming to redo write and physical write, many platform can handle 1MB per physical write, if the db is on raw, so is enough. but if db is on file system, it could be as small as 8K per write and is subject to tuning. (didn't use lgwr but did test i/o size using simple os cmds) but today as our storage specialist pointed out, os/storage have rollback too! they should be able to make sure 1 big write request from lgwr is either accomplished or failed -- cleanup then. so could probably put my assumption / worry about corruption from partially flushed redo write to rest.

p.s. it wasn't our choice to join, and we're still getting lost on the campus. i'll come back once i have something (that's not confidential) ;o) p.p.s. thanks for pointing to the very interesting thread. exactly why i chose to post my Qs here.

-jessica

Jeremy Paul Schneider wrote, On 4/26/2007 6:22 AM:
> FWIW here's a good discussion of private redo strands (aka zero-copy
> redo):
>
> http://www.freelists.org/archives/oracle-l/02-2005/threads.html#00630
> - The thread is called "latch-free SCN scheme ( 10.1.0.3
> <http://10.1.0.3>)
>
>
> On 4/26/07, *Jeremy Paul Schneider* <jeremy.schneider_at_ardentperf.com
> <mailto:jeremy.schneider_at_ardentperf.com>> wrote:
>
> Yeah... I wasn't thinking about nowait or private strands... and
> I don't (yet) know a lot about the specifics of how these work
> internally. Also, in addition to private strands which were
> apparently introduced in 10g there's also log parallelism which
> was introduced in 9i allowing multiple processes to write to
> different areas of the main redo buffer simultaneously.
>
> I don't know what the implications are of this; but as I said
> before I have a hunch that this has already been carefully worked
> through by the engineers at Oracle - considering the fanfare with
> the release of COMMIT NOWAIT and considering the importance of
> crash recoverability in Oracle.
>
> A few other thoughts - based on my understanding of redo and crash
> recovery my guess is the opposite of yours - that in your example
> using COMMIT NOWAIT *any* records whose COMMIT made it into the
> redo log will not be rolled back. But another thought - from what
> I can gather (based on reading a few old oracle-l emails,
> presentations, and my own guesses) - private redo strands and
> individual buffer latches (when using parallelism) are allocated
> per-process; so assuming that TX1 and TX2 are happening in the
> same session, I think that their log entries would probably be
> written out in order to the logfiles even if private strands or
> parallelism were enabled. But that's just conjecture on my part.
>
> Hmmm... maybe you could make a test tablespace and a test table
> with a few rows and one row per block, then put the tablespace in
> backup mode and spawn a few processes that update the table. Then
> strace (or truss on sun) the LGWR process and see if the writes
> are sequential and how big the writes are... also it's worth
> pointing out that even if we're issuing 1MB writes to the OS we'd
> still want to ensure that that OS is writing that data in order
> (if the device itself doesn't support 1MB writes). I think it
> does but I can't prove this either at the moment.
>
> -Jeremy
>
>
> PS - considering the domain name of your email address, if this is
> such a critical question for your "bosses" then is there any way
> they can make an inquiry to some of the engineers who actually
> work on this stuff?
>
> PPS - maybe someone who's got a lot more experience than I will
> add their thoughts... then I could learn a bit more about this
> too. :)
>

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Apr 27 2007 - 04:34:45 CDT