RE: DataGuard

From: Allen, Brandon <Brandon.Allen_at_OneNeck.com>
Date: Tue, 16 Jan 2007 10:33:17 -0700
Message-ID: <04DDF147ED3A0D42B48A48A18D574C45059E2822@NT15.oneneck.corp>

Yes, we are most likely going to increase the bandwidth, but regardless of that, we want to ensure that the standby will *never* halt the production database - we are okay with some transaction loss, that's why we are running in maximum performance mode and using ARCH to transfer the logs. The business folks have been well informed and are okay with the potential transaction loss - they are not okay with slowing down their production system. No matter how much bandwidth we buy, there could always be problems with the WAN service provider, or someone accidentally FTPing some huge files over the same pipe and slowing things down so we also need to make sure that this will not cause problems for our primary database.

Are you aware of any configuration that would meet these requirements other than the proposed cascading primary>localSB>remoteSB?

We are running a COTS app (BaanIV ERP) so we can't do much about the redo.

Thanks!
Brandon

From: Carel-Jan Engel [mailto:cjpengel.dbalert_at_xs4all.nl] Sent: Monday, January 15, 2007 5:23 PM
To: Allen, Brandon
Cc: rgoulet_at_kanbay.com; oracle-l_at_freelists.org Subject: RE: DataGuard

Brandon,

If the system is important for the business, they (the business) should provide enough (budget for enough) bandwidth. If they can afford the money for an extra server, with extra Oracle EE licenses, having a proper line with enough bw. shouldn't be a very difficult business case.

If you follow the suggestion to use the local standby as a buffer for redo forwarding, be aware that an unknown amount of redo is not sent to the DR site at any given point in time. If at that point the disaster strikes, you will loose transactions. If the business is aware of that risk, and made the trade-off, fine! Let them confirm that to you in writing. Too often they are unaware, and the technicians get blamed in the event of a failover to the DR-site, loosing important transactions. Just because the techies were responible for the HA solution. Wrong. Management doesn't make housekeeping responsible for insuring the building either.

Consider using hardware line cards to compress the redo traffic. Use QOS on the routers to prioritize the data sent on the portnumber you choose for redo transport. People might come up with the suggestion of using ssh tunneling with compression. IMHO: Too cumbersome for HA. You need an extra process, it will consume CPU, it can fail, needs monitoring, etc. James Morle used Cisco line cards at a DG site we set up and they got typically 4:1 compression. They were appr. $1000 each. No setup, no worries, no monitoring.

Finally: is the application optimized for minimized redo generation?

Best regards,

Carel-Jan Engel

===
If you think education is expensive, try ignorance. (Derek Bok) ===
On Mon, 2007-01-15 at 16:48 -0700, Allen, Brandon wrote:

I'm not questioning Carel-Jan's recommendation at all - I think he has more DG knowledge in his pinky than I'll ever have, but just passing on a case where cascading setup might be appropriate/necessary:

We have been struggling to get a standard (single standby) DG setup working for the last few months because our network connection isn't sufficient to keep up with the rate of our redo generation and when the transfer of archived logs falls behind far enough, it eventually freezes the production database. We're using ARCH to transfer the logs and already tried upgrading to 9.2.0.8 and setting the hidden parameter _log_archive_callout='LOCAL_FIRST=true', but we still see this behavior. Oracle Support's recommendation is to implement a cascading standby where we ship the logs to a local standby first and then go from the local to the remote so that the local standby operates as a buffer to keep the slow network from halting our primary. We are considering their recommendation, but going to try everything else we can think of to avoid it first, which will probably include upgrading to 10.2 because supposedly this problem no longer occurs in 10g, but that's the same thing we were told about 9.2.0.7 with the local_first=true setting (Metalink 260040.1) and we're not very confident based on our experience with that config.

Of course another option that we're considering is increasing the network bandwidth to the remote destination, but we would really like to have dataguard configured such that it will absolutely never impair production performance because even with the increased bandwidth, there is always the possibility of WAN problems, someone accidentally clogging the pipe with other large files, etc.

Carel-Jan, if you have any recommendations, we'd love to hear them!

Thanks,

Brandon

From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Carel-Jan Engel

Sent: Monday, January 15, 2007 4:12 PM

Stay away from cascaded redo transport as far as you can. You simply don't want that in HA environments. As Miracleas.dk states: '..Complexity is the enemy of availability...' (and I'd like to add: '.. but the friend of consultancy...' J). Imagine a switchover with cascaded transport. The whole redo transport stack has to be reinvented. A star-configuration (primary points to both standbys) is much easier to setup.

Privileged/Confidential Information may be contained in this message or attachments hereto. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of this company shall be understood as neither given nor endorsed

--
http://www.freelists.org/webpage/oracle-l

Received on Tue Jan 16 2007 - 11:33:17 CST