Tina,
If you find any, I'd love to see them too!
Here are some issues, off the top of my head and in no
particular order, that need to be considered for
recovery in an n-way replication environment (you have
more flexibility since you're essentially doing
one-way replication for failover):
- SET UP CONFLICT RESOLUTION!!!!!!
In the case of point-in-time recovery (PITR) this
is absolutely critical, unless you want to add the
stress and hassle of having to manually resolve
conflicts onto what is quite likely already a
stressful situation!
Even in the case of 1-way replication, you'd be
surprised how easy it is for someone to let users into
the wrong site or for a batch job or script to be run
on the wrong site.
- How will you handle PITR? If you perform it only
one site, your replicated objects will be out of sync.
Do you plan on resynching the tables after the
recovery? If so, what is the interaction between the
non-replicated tables (which have been recovered to an
older point-in-time) with the replicated tables (which
have been resynched with the other site(s))?
- Did I mention conflict resolution?! Many people
avoid conflict resolution because it forces you to
really analyze the business flow and needs of the
application. IMHO it's a case of "pay me now or go
bankrupt later!"
- How long can you afford an outage. One thing a lot
of people never consider is whether their replication
environment can ever "catch up" from an extended
outage at one site. The tx's will continue to pile up
at the other site(s), but once the down site comes
back up how long will it take to push that backlog.
This is a very critical issue on Oracle7 since the
only option is a serial push (O8 allows parallel
pushing along with additional architectural changes
which dramatically increase performance for
replication).
- If complete recovery and allowing the backed up
tx's to be pushed is not an option, how do you plan to
get your sites in sync again? Can you afford the
downtime needed for offline instantiation? Remember
that almost all administrative tasks require
suspending replication for the group being suspended
(all groups on O7) and that while a group is
quiescing/quiesced no DML operations are allowed on
the replicated objects.
Again, I cannot emphasize enough the importance of a
robust conflict resolution scheme. In many cases it
can mean the difference between keeping and losing
your job!
Additions, corrections, and comments are, of course,
welcome.
HTH,
- Anita
- Tina Ridgley <tlridgley_at_yahoo.com> wrote:
> Hi all.
>
> We are using multi-master replication using Oracle
> 8.1.6 running on Solaris 2.6. Although we are set
> up
> for multi-master replication, there will only be
> transaction activity at one site unless the primary
> site is lost, in which case we'll be switching to
> the
> secondary site.
>
> Does anyone have any experience performing recovery
> in
> this type of environment or know of any papers which
> outline what steps should be taken for performing
> recovery in a replicated environment using different
> recovery scenarios.
>
> I would appreciate any help I can get in this area
> as
> I have been having trouble finding any sources which
> address this with any level of detail.
>
> TIA,
>
> Tina
>
> __________________________________________________
> Do You Yahoo!?
> Get Yahoo! Mail - Free email you can access from
> anywhere!
> http://mail.yahoo.com/
> --
> Author: Tina Ridgley
> INET: tlridgley_at_yahoo.com
>
> Fat City Network Services -- (858) 538-5051 FAX:
> (858) 538-5051
> San Diego, California -- Public Internet
> access / Mailing Lists
>
> To REMOVE yourself from this mailing list, send an
> E-Mail message
> to: ListGuru_at_fatcity.com (note EXACT spelling of
> 'ListGuru') and in
> the message BODY, include a line containing: UNSUB
> ORACLE-L
> (or the name of mailing list you want to be removed
> from). You may
> also send the HELP command for other information
> (like subscribing).
Received on Thu Jun 29 2000 - 00:30:55 CDT