Re: data guard fast start failover

From: fairlie rego <fairlie_r_at_yahoo.com>
Date: Thu, 22 Jan 2009 03:50:44 -0800 (PST)
Message-ID: <423426.20025.qm_at_web45009.mail.sp1.yahoo.com>



 
That is correct Alex.
We get around these issues partially by using outbound_connect_timeout in the sqlnet.ora of the mid tiers. (Not sure what is your client version ) We have a value of 3 seconds for OCT.  
So if we take an example of the following connect string

xxxx =

  (DESCRIPTION =
    (ADDRESS_LIST =
      (LOAD_BALANCE = OFF)
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby1-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby2-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby3-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby4-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby5-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby6-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby7-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = stdby8-vip.sys.au.eds.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = prim1-vip.sys.au.eds.com)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = xxxx.commsec.com.au)
    )
  )
 

I have set load_balance = OFF so that we traverse through all the standby nodes which when down (in this case fictitious nodes)  and with OCT set to 3 it takes around 12 seconds to establish a connection from a Solaris 10.2.0.3 client.  
The other benefit of having all nodes in the mid tier is that we did not have to change the tnsnames.ora each time did a switchover. We have done 8 switchovers over the past 3 months  
Hope that makes sense.. The rest over some beer

Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
 

  • On Tue, 20/1/09, Alex Gorbachev <ag_at_oracloid.com> wrote:

From: Alex Gorbachev <ag_at_oracloid.com> Subject: Re: data guard fast start failover To: fairlie_r_at_yahoo.com
Cc: "ORACLE-L Freelists" <oracle-l_at_freelists.org> Received: Tuesday, 20 January, 2009, 11:26 AM

There are two issues - one is WebLogic specific as they have their own connection management with multi-pools for DataGuard. (don't ask - they are working on integration with FAN and RLB but that's not available yet).

The 2nd issue is generic - and with introduction of Oracle Clusterware, Oracle solved is with VIP's. The problem is that when IP is not available, the connection times out after a while. This is why VIP's are taken over by survived nodes in RAC but I don't need to explain that to you. However, Data Guard standby does not take over VIP's when it's promoted to primary. This means that application connection to VIP's of old primary (now unavailable if site are down or hosts a down) will take a while to timeout. If client side Load Balancing is ON between standby and primary address_list's (in rare cases when there is not real DR and people switch between sites regularly) then about 50% of connection requests will timeout after a minute or two whatever your tcp_timeout setting in apps tier. If you configure your descriptor without load balance option between primary and standby address lists but only with failover then 100% of re-connects will be delayed.

Fairlie, please correct what I've got wrong here.

Cheers,
Alex

On 20/01/2009, at 9:43 AM, fairlie rego wrote:

You have a connection to the each node in RAC but how you handle connections to standby?  
Alex,
 
In the environment I am currently working on (2 8 node clusters in DG config)  we have both the primary and standby clusters node virtual IPs in the tnsnames.ora (16 nodes) .  
The application connects to RAC services which run only on the primary cluster. Upon switchover/failure the db_role_change trigger fires which starts the services on the standby nodes. Ofcourse it is a pain that dbms_service does not update the OCR but let me not digress....  
Am just curious as to why this may not work for you

Thanks
 

Fairlie Rego
Senior Oracle Consultant
http://el-caro.blogspot.com/
M: +61 402 792 405
 

  • On Mon, 19/1/09, Alex Gorbachev <ag_at_oracloid.com> wrote:

From: Alex Gorbachev <ag_at_oracloid.com> Subject: Re: data guard fast start failover To: "Mark Strickland" <strickland.mark_at_gmail.com> Cc: Laimutis.Nedzinskas_at_seb.lt, oracle-l_at_freelists.org Received: Monday, 19 January, 2009, 9:58 AM

Thanks Mark,

What about Data Guard now? You have a connection to the each node in RAC but how you handle connections to standby? On one project I'm working on now, with RAC on primary and RAC on standby, we plan to setup multi-pool controlling underlying pools for each instance on primary *AND* standby. Theoretically, WebLogic multi-pool with load balancing will not send transactions to the "broken" pools but in the past we didn't have good experience with that. Another issue is the failover time - VIP's are not taken over by standby on role switch and, of course, connection timeout takes long time so if it's 60 seconds for you, is your OS setting for tcp_timeout 60 seconds?

Anybody attempted to do automation of VIP management integrating it with Observer and FSFO?

Cheers,
Alex

On 19/01/2009, at 9:17 AM, Mark Strickland wrote:

I'll find out more from our WebLogic SME, but we're using WebLogic multi-pools (multi-datasources?), ie each server running WebLogic has three connection pools -- one for each of the RAC instances.  The connections do re-connect automatically after failover.  We're finding that it takes 60-90 seconds for failover and reconnect.  I believe that we are using WebLogic XA transactions but I'll verify.

-Mark

On Sun, Jan 18, 2009 at 1:49 PM, Alex Gorbachev <ag_at_oracloid.com> wrote:

Hi Mark,

Could you elaborate on WebLogic config you are using for RAC?
- Is it configured using WebLogic multi-datasources?

  • Do you use WebLogic XA transactions? Does WebLogic datasource re-tries transaction on reconnect?
  • What are the patched you mentioned (perhaps, you have the reference to the WebLogic support docs)?

Cheers,
Alex

On 17/01/2009, at 8:52 AM, Mark Strickland wrote:

We've been testing FSF with 10.2.0.2 and my co-DBA discovered a bug that can cause a split-brain to occur.  I don't remember the exact circumstances, but the fix is in 10.2.0.4 which is driving us to apply that patchset.  Our FSF testing with 10.2.0.4 has been going very well.  If you use WebLogic, it will handle a failover but it requires a patch depending on what version you use.  I've been doing new 10.2.0.4 builds with RAC and Data Guard with FSF for a new customer.  No issues so far.

Mark
Seattle, WA

On Thu, Jan 15, 2009 at 11:27 PM, <Laimutis.Nedzinskas_at_seb.lt> wrote:

Hi all

Anyone's using data guard fast-start failover ? What are the experiences ?
What about split brain?
Does it interfere heavily with normal database activities? Any other comments?

Thank you in advance,

Laimis N

--

http://www.freelists.org/webpage/oracle-l

Stay connected to the people that matter most with a smarter inbox. Take a look.

Stay connected to the people that matter most with a smarter inbox. Take a look http://au.docs.yahoo.com/mail/smarterinbox

--

http://www.freelists.org/webpage/oracle-l Received on Thu Jan 22 2009 - 05:50:44 CST

Original text of this message