Re: Getting a consistent copy

From: John Hurley <johnbhurley_at_sbcglobal.net>
Date: Thu, 30 Jul 2009 16:15:51 -0700 (PDT)
Message-ID: <03129de9-871b-4241-873e-4c591c7c4435_at_h31g2000yqd.googlegroups.com>



On Jul 30, 2:20 pm, joel garry <joel-ga..._at_home.com> wrote:

snip

Joel if you want to email and then chat on the phone that's fine.

> > It all deals with preserving write order dependencies ... EMC
> > certainly can handle these requirements.  I cannot imagine that any of
> > the major storage vendors have problems here.  Then it comes down to
> > design requirements and performance requirements.
>
> This is where I'm failing to understand - it's not just the order, but
> the set of writes to avoid block fracture.  Or is this just a semantic
> thing, and that's what it means?

Well the write order of info/blocks going to the online logs needs to be maintained. From a database recovery point of view ... that's all you need afaik.

If a changed block is written completely / partially written / remains in memory ... instance recovery and/or tran rollback will address that when and if it needs to.

> > > What are the network requirements for synchronous replication.
>
> > That's going to vary based on how much changes are occurring in the
> > database that is getting replicated.
>
> > Synchronous replication can have a severe impact on commit intensive
> > systems.  Each write goes to cache in the local storage subsystem ...
> > then gets transmitted to the remote storage system ... then an
> > acknowledgement that it was received goes back to local storage
> > subsystem ...
>
> I've been watching this specifically on my systems, because generally
> it hums along, but some particular things just go commit crazy, and I
> haven't figured out exactly why yet.  The code is oci from a
> generator, which has an option to control commits, but those programs
> aren't using it as far as I can tell.  But of course this isn't
> noticeable in a way users would complain, so it doesn't have high
> business priority.
>
> I'm a little uncertain if this syncronous acknowledgement is getting
> back to Oracle (I mean, Oracle would be waiting on it)?  Sounds like
> that is what you are saying and that is what I'd expect, but I need to
> be clear on the point.

Oracle definitely knows if a write to the online log completes successfully or not. That's critical. So when commits occur ... oracle writes online log info ... waits to see that succeeds ... before control is returned to app that issued the commit.

In a synchronous storage based replication environment ... the wait to see that succeeded part involves the local storage array getting the data ( held in cache memory is fine ... the array will sort out actually writing it to disk ) ... the local array sending the data to remote array ( again it just needs to get into write cache ) ... the remote array telling the local array ( I got it ... acknowledgement ) ... and a return code back to the operating system that issued the write. That's at least 2 network trips between the local and remote array ( if no packets are lost ).

> > The wait event that often gets impacted the most is log file sync.
>
> > Can your applications survive the performance impact of a potentially
> > huge hike in log file sync?
>
> No.  Of course, this is a whole subject in itself, but that is a
> useful nugget.

Of course other kinds of writes from the oracle database can also be impacted in a synchronous environment but their impact is often ( relatively ) minor compared to the consequences of a bunch of extra milliseconds to log file sync.

Changing from synchronous to async replication can often cause a 15/20/25+ millisecond log file sync wait time to go to whatever your local array can give you ( I usually am down way below 1 millisecond ).

If you are talking synchronous replication you better not be transmitting the data "too far" between storage arrays unless you got some extra special deluxe network stuff in place.

> > > How this relates to a Nimbus RH100, which claims "Snapshots, cloning,
> > > volume migration, synchronous mirroring, and asynchronous replication"
> > > among other things.

> > You lost me completely here.  You want to really understand the
> > specific capabilities of the actual storage hardware and software that
> > will be used.
>
> Well yeah.  The hardware is already justified and bought by non-db
> requirements, so if the db can benefit as gravy and replace the
> standby, all the better.  I'm sure I'm not the only one in this sort
> of situation.  At some unpredictable point in the not too distant
> future, someone will likely throw me at the demo.

Biggest decision point here is synchronous versus asynchronous.

Anytime you choose asynch data and transactions can and probably will get lost in a real DR.

What can your business afford to lose in a DR? If they say "nothing" then you ask how much can you spend for a synchronous environment and how bad is performance allowed to suffer. Can a compromise be made to put the remote array "somewhere pretty close" ... etc.

If it is a real DR then many businesses can actually lose some data and be ok. Better than not surviving a DR at all right?

I will skip getting on the rant about DR testing. All this stuff is useless if it is not tested and re-tested and exercised periodically. Changes in the environment often cause all the stuff done previously to mean jack if someone isn't watching this area constantly.

Does youf organization have a DR manager or director? How about a change control manager/director? If either of those things are not in place it is a steep uphill battle no matter what technical solutions are chosen. Received on Thu Jul 30 2009 - 18:15:51 CDT

Original text of this message