Replication can be used to perform failover -- but it wasn't really designed for
failover. It was designed to make data available in more then one location. The way
replication works is that a transaction takes place at 'db1'. That transaction is stored
in a local queue at 'db1'. when the user commits, the database will (in the background,
not as part of the user transaction) forward the transaction stored in the queue to other
databases interested in that data. This transaction is then performed at these other
sites. Replication is asyncronous in nature - you have to assume that at any point in
time, the various databases can and will contain different data as they are all at
different points in time. This fact can make application development somewhat harder.
Lets say the site you are building is a site to sell stuff. You have an inventory. You
have X widgets in inventory. At 2 different sites, people simultaneously order X
widgets. As far as your software is concerned, both orders can be filled. It is only
AFTER the transactions at both sites have been performed that you discover in the
background that you now have -X widgets in inventory (you had an order for 2X widgets).
This is known as an update conflict and it is up to you to 'fix it'. Oracle discovers it
for you but will then invoke user defined code to fix the problem. This process must be
part of your application design.
You can decide to avoid conflicts by partitioning data (eg: west coast data and east
coast data, you need to goto the west coast server to affect west coast data and vice
versa). Many people do this however if you fail over from the west coast to the east
coast -- you must remember that there probably are transactions queued in the failed west
coast machine that will be propagated to the east coast when you fix the problem. You
still have the potential for update conflicts when using replication for failover -- you
still need to design for how you will handle these issues.
if failover and availability are the key issues -- OPS wins out in my opinion. if
redundancy is the main issue -- replication is the answer (as OPS is not a redudant
solution -- there is only one database). They are equal in in the backup area (same
issues, same considerations).
Both replication and OPS have certain performance implications. Both will have to be
designed for in many respects. With OPS and a 2 node configuration -- we can eliminate
the performance considerations. With OPS and a single write node and many (more then 2)
read nodes, we can eliminate most of the performance considerations. With replication, a
client transaction will generate about 2x the IO for any given WRITE transaction and will
not affect read transactions at all.
In short -- both will have issues.
Some of the upsides of replication are:
- replication works over long distances and WANS.
- replication is tolerant of short network outages between database servers (short is
relative here and is a function of the amount of changes performed in a given period of
time. you do not want the servers to get too far out of sync. it could affect the
application as well if the data diverges too much).
- replication imposes a 'fixed' overhead (2X the data modified for inserts/deletes, 3X
for updates (before and after images are stored in the queue).
- replication provides you with a copy of the data elsewhere. with OPS -- there is one
database. An earthquake can take out the only copy.
Some of the downsides are:
- conflict resolution and its impact must be carefully studied and designed for.
- you need a DBA familar with this technlogy.
- your developers need to be familar with the impact of this technology.
(NOTE: you can use OPS and replication together. OPS for rapid failover, replication for
a hot spare. something else to consider).
o Standby Database
the standby database is a nice solution for failover. In this mode, the logs generated
by Oracle are forwarded to another computer in the network and applied to a database
instance that is in constant recovery mode. This standby database can be opened for
reporting and such (read only) but generally is unavailable while in recovery mode. This
standby database lags behind the real database by some small amount (governed by the size
of your redo files) but can be brought on line rapidly after a failure and can be totally
caught up to the master database if you can get to the file system on the down machine
(eg: plug the disks that contain the redo into another machine). You should be aware the
replication has the same sort of lag time as a standby database would.
Some of the upsides of standby databases are:
- they are extremely easy to setup.
- they have no performance impact on the master machine.
- they impose no design considerations on the application.
Some of the downsides are:
- the DBA must be aware of the standby database. If you make structural changes to the
master, they must be coordinate to the standby. This is minimal.
- once you failover to the standby -- there is *no* failback. The standby will become
the master. You will build a new standby to cover this new master.