Re: Getting a consistent copy

From: joel garry <joel-garry_at_home.com>
Date: Thu, 30 Jul 2009 11:20:52 -0700 (PDT)
Message-ID: <80c0d2b7-a96e-448b-a7c1-a09cf46db7bd_at_u16g2000pru.googlegroups.com>



On Jul 30, 6:41 am, John Hurley <johnbhur..._at_sbcglobal.net> wrote:
> On Jul 28, 5:37 pm, joel garry <joel-ga..._at_home.com> wrote:
>
> snip
>
> > The thing that my users have in mind is replicating an OLTP database
> > remotely, just the storage with the disaster site node shut down, then
> > in a disaster, fire it up and everything works.  Instead of just using
> > a standby db.
>
> > As I understand things:
>
> > In EMC-speak, this requires either synchronous mode or doing a suspend
> > and then snapping.
>
> No.  Even with async you can design things so that suspends are not
> necessary.  We don't have any suspends in place.
>
> It all deals with preserving write order dependencies ... EMC
> certainly can handle these requirements.  I cannot imagine that any of
> the major storage vendors have problems here.  Then it comes down to
> design requirements and performance requirements.

This is where I'm failing to understand - it's not just the order, but the set of writes to avoid block fracture. Or is this just a semantic thing, and that's what it means?

>
> > What I don't know, and would like to find out:
>
> > What are the network requirements for synchronous replication.
>
> That's going to vary based on how much changes are occurring in the
> database that is getting replicated.
>
> Synchronous replication can have a severe impact on commit intensive
> systems.  Each write goes to cache in the local storage subsystem ...
> then gets transmitted to the remote storage system ... then an
> acknowledgement that it was received goes back to local storage
> subsystem ...

I've been watching this specifically on my systems, because generally it hums along, but some particular things just go commit crazy, and I haven't figured out exactly why yet. The code is oci from a generator, which has an option to control commits, but those programs aren't using it as far as I can tell. But of course this isn't noticeable in a way users would complain, so it doesn't have high business priority.

I'm a little uncertain if this syncronous acknowledgement is getting back to Oracle (I mean, Oracle would be waiting on it)? Sounds like that is what you are saying and that is what I'd expect, but I need to be clear on the point.

>
> The wait event that often gets impacted the most is log file sync.
>
> Can your applications survive the performance impact of a potentially
> huge hike in log file sync?

No. Of course, this is a whole subject in itself, but that is a useful nugget.

>
> > Is it really a good idea to be suspending the OLTP db every 20
> > minutes.  Client/server and n-tier order entry people can get a ways
> > ahead of the app already.
>
> Unclear to me why you believe this is a requirement.  Just because
> some people have stuck it in does not mean it is a requirement in all
> designs.

Comes from my above stated ignorance about write order dependencies.

>
> > How this relates to a Nimbus RH100, which claims "Snapshots, cloning,
> > volume migration, synchronous mirroring, and asynchronous replication"
> > among other things.
>
> You lost me completely here.  You want to really understand the
> specific capabilities of the actual storage hardware and software that
> will be used.

Well yeah. The hardware is already justified and bought by non-db requirements, so if the db can benefit as gravy and replace the standby, all the better. I'm sure I'm not the only one in this sort of situation. At some unpredictable point in the not too distant future, someone will likely throw me at the demo.

>
> These things are not generic designs that always work the same and you
> can swap out pieces parts and change vendors etc.

Why not? DBA's are :-O

>
> > How would one make a fair test of recovery.  I'm not convinced a
> > normal load is a fair test.
>
> Well you have a DR test plan, a simulated DR test plan, a planned
> moved to the DR site, and a planned move back from the DR site.
>
> You document it all and test it repeatedly and regularly and make
> adjustments.

I've had bad experiences with this in the past, so I'm paranoid. One stupid piece of hardware or software in the chain can, um, not fulfill managements preconceived notions. Also, I don't have symmetry between sites, degraded functioning at DR is thrown in (same as with standby). I would like to see such a thing work, but I feel compelled to expose fundamental weakness if there is any. I need to understand exactly what is going on to get there.

>
> There's a pretty steep learning curve in this area and the help that
> you get from the storage vendors ( beyond the basic yeah yeah we can
> do it ) is often dicey.  Hiring someone or bringing in a consultant
> who has been there and done it with the storage hardware and software
> that you plan on putting in place is often the best bet.

Thanks John, that gives me a lot to chew on. I'm spending my week fixing another fine mess an app vendor consultant made, btw.

jg

--
_at_home.com is bogus.
"Turn on the bubble machine...  God, is that a cheap bubble machine."
- Frank Zappa
Received on Thu Jul 30 2009 - 13:20:52 CDT

Original text of this message